PHP :: Doc Bug #36261 :: similar_text parameters aren't swappable

similar_text parameters aren't swappable

Submitted:

2006-02-02 16:41 UTC

Modified:

2018-02-22 17:42 UTC

Votes:	1
Avg. Score:	4.0 ± 0.0
Reproduced:	1 of 1 (100.0%)
Same Version:	0 (0.0%)
Same OS:	0 (0.0%)

From:

mac30 at narod dot ru

Assigned:

cmb (profile)

Status:

Closed

Package:

Strings related

PHP Version:

OS:

Private report:

CVE-ID:

None

View Developer Edit

Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.

php.net Username: php.net Password:

Quick Fix:	(description)
	Block user comment
Status:		Assign to:
Package:
Bug Type:
Summary:
From:	mac30 at narod dot ru
New email:
PHP Version:		OS:

New/Additional Comment:

[2006-02-02 16:41 UTC] mac30 at narod dot ru

Description:
------------
Hi! I have some problems with similar_text() function.
In most cases using similar_text(s1,s2) and similar_text(s2,s1) I get the same results. For example:

(format: s1-first_string, s2-second_string,
  similar_text(s1,s2) (percents%)
  similar_text(s2,s1) (percents%)

ziga piekierski      1 (14.3%), 1 (14.3%)
sadlowski piekierski 3 (31.6%), 3 (31.6%),
ogorek piekierski    2 (25.0%), 2 (25.0%)
majeski piekierski   4 (47.1%), 4 (47.1%),

They gives the same results regardless I use s1,s2 or s2,s1. But in some case I gets different results and I don't know why:

natujeszczak piekierski 2 (18.2%), 3 (27.3%)
andrzejewski piekierski 5 (45.5%), 4 (36.4%) 
michaelski pankanin     3 (33.3%), 1 (11.1%)
cegielski pankanin      2 (23.5%), 1 (11.8%)

I used PHP 4.4.0 and 5.0.5 and got exactly the same results on both of them.

I find similar_text function very useful and that's a pity it's not reliable, because I don't know which value (of the two) is correct.

Using LCS function (taken from comments on similar_text manual page) I was always getting the same results for (s1,s2) and (s2,s1) - always the higher value of similar_text(s1,s2) and similar_text(s2,s1)

Best Regards
Maciej

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports

[2011-07-11 12:15 UTC] dejaschinski at googlemail dot com

I've running in the same problem. I'm using php-5.3.6 with opensuse11.3.

<?php
echo similar_text("test","wert");
echo "|";
echo similar_text("wert","test");

Expected result:
2|2
Current result:
1|2

[2015-01-09 00:09 UTC] ajf@php.net

-Package: Feature/Change Request +Package: *General Issues

[2015-01-09 00:09 UTC] ajf@php.net

Sounds like a bug, not a feature request, to me.

[2015-01-09 00:11 UTC] ajf@php.net

-Type: Feature/Change Request +Type: Bug

[2015-04-12 19:38 UTC] cmb@php.net

-Summary: similar_text parameters aren't convertible +Summary: similar_text parameters aren't swappable -Type: Bug +Type: Documentation Problem -Package: *General Issues +Package: Strings related -Operating System: linux - kernel 2.6 +Operating System: * -PHP Version: 4.4.2 +PHP Version: * -Assigned To: +Assigned To: cmb

[2015-04-12 19:38 UTC] cmb@php.net

At first, it seems to be important to explain the algorithm. It is
finding the longest common substring of two strings, and then doing
this for the prefixes resp. the suffixes, recursively. The lengths
of the found common substrings are added and returned as result.
The percentage is calculated by multiplying the result with 200 and
dividing by the sum of the lenghts of both input strings.

Finding the longest common substring of two strings is accomplished
by the following algorithm:

    function longest_common_substring($a, $b) {
        $max = 0;
        for ($i = 0; $i < strlen($a); $i++) {
            for ($j = 0; $j < strlen($b); $j++) {
                $l = 0;
                while ($i + $l < strlen($a)
                       && $j + $l < strlen($b)
                       && $a[$i + $l] == $b[$j + $l]
                ) {
                    $l++;
                }
                if ($l > $max) {
                    $max = $l;
                }
            }
        }
        return $max;
    }

A noteworthy detail of the algorithm is that it always takes the
first longest common substring in either of the strings.

A simple example:
  similar_text('aabccc', 'aadccc') // => 5
  
The longest common substring is 'ccc' with a length of 3. None of
the input strings has a suffix after 'ccc', so only the prefixes
('aab' and 'aad') are compared. Their longest common substring is
'aa' (length 2); both don't have a prefix, so the suffixes 'b' and
'd' are compared, resulting in no common substring (length 0).
Adding up the lengths results in 5.

Another example:
  similar_text('michaelski', 'pankanin') // => 1
  
The first longest common substring that is found is 'i' (length 1),
so the prefixes are 'm' and 'pankan' and the suffixes are
'chaelski' and 'n'. The longest common substring of the prefixes is
'' (length 0), the longest common substring of the suffixes is ''
(length 0). So the result is 1.

Swapping the arguments has a different result, though:
  similar_text('pankanin', 'michaelski') // => 3
  
That is because the first longest common substring is 'a' (length
1), with the prefixes 'p' and 'mich' and the suffixes 'nkanin' and
'elski'. The prefixes have no common substring, but the suffixes
have 'k' (length 1), which has the prefixes 'n' and 'els' and the
suffixes 'anin' and 'i'. The prefixes have no common substring, but
the suffixes have 'i' (length 1). Therefore the result is 1.

One might argue that the algorithm is imperfect, but at least the
implementation is correct. It seems reasonable to document the
behavior of similar_text more clearly, therefore I'm changing to
"Documentation Problem".

> I used PHP 4.4.0 and 5.0.5 and got exactly the same results on
> both of them.

According to <http://3v4l.org/N6j98> the results under PHP 4.4.0
and 5.0.5 are the same than under most recent versions.

> Using LCS function (taken from comments on similar_text manual
> page)

This function does basically the same as longest_common_substring()
above, which is only part of the more complex similar_text()
algorithm.

[2015-04-12 19:38 UTC] cmb@php.net

-Assigned To: cmb +Assigned To:

[2015-04-12 19:45 UTC] cmb@php.net

Related To: Bug #62648

[2018-02-22 17:42 UTC] cmb@php.net

Automatic comment from SVN on behalf of cmb
Revision: http://svn.php.net/viewvc/?view=revision&amp;revision=344329
Log: Fix bug #36261: similar_text parameters aren't swappable

[2018-02-22 17:42 UTC] cmb@php.net

-Status: Open +Status: Closed -Assigned To: +Assigned To: cmb

[2020-02-07 06:05 UTC] phpdocbot@php.net

Automatic comment on behalf of cmb
Revision: http://git.php.net/?p=doc/en.git;a=commit;h=d95d5914f5b9c244df7358e23cdfbb1b70cff1ec
Log: Fix bug #36261: similar_text parameters aren't swappable

	php.net \| support \| documentation \| report a bug \| advanced search \| search howto \| statistics \| random bug \| login
go to bug id or search bugs for


Copyright © 2001-2025 The PHP Group All rights reserved.	Last updated: Sat Nov 01 01:00:01 2025 UTC