php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #36261 similar_text parameters aren't swappable
Submitted: 2006-02-02 16:41 UTC Modified: 2018-02-22 17:42 UTC
Votes:1
Avg. Score:4.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: mac30 at narod dot ru Assigned: cmb (profile)
Status: Closed Package: Strings related
PHP Version: * OS: *
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: mac30 at narod dot ru
New email:
PHP Version: OS:

 

 [2006-02-02 16:41 UTC] mac30 at narod dot ru
Description:
------------
Hi! I have some problems with similar_text() function.
In most cases using similar_text(s1,s2) and similar_text(s2,s1) I get the same results. For example:

(format: s1-first_string, s2-second_string,
  similar_text(s1,s2) (percents%)
  similar_text(s2,s1) (percents%)

ziga piekierski      1 (14.3%), 1 (14.3%)
sadlowski piekierski 3 (31.6%), 3 (31.6%),
ogorek piekierski    2 (25.0%), 2 (25.0%)
majeski piekierski   4 (47.1%), 4 (47.1%),

They gives the same results regardless I use s1,s2 or s2,s1. But in some case I gets different results and I don't know why:

natujeszczak piekierski 2 (18.2%), 3 (27.3%)
andrzejewski piekierski 5 (45.5%), 4 (36.4%) 
michaelski pankanin     3 (33.3%), 1 (11.1%)
cegielski pankanin      2 (23.5%), 1 (11.8%)

I used PHP 4.4.0 and 5.0.5 and got exactly the same results on both of them.

I find similar_text function very useful and that's a pity it's not reliable, because I don't know which value (of the two) is correct.

Using LCS function (taken from comments on similar_text manual page) I was always getting the same results for (s1,s2) and (s2,s1) - always the higher value of similar_text(s1,s2) and similar_text(s2,s1)

Best Regards
Maciej



Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2011-07-11 12:15 UTC] dejaschinski at googlemail dot com
I've running in the same problem. I'm using php-5.3.6 with opensuse11.3.

<?php
echo similar_text("test","wert");
echo "|";
echo similar_text("wert","test");

Expected result:
2|2
Current result:
1|2
 [2015-01-09 00:09 UTC] ajf@php.net
-Package: Feature/Change Request +Package: *General Issues
 [2015-01-09 00:09 UTC] ajf@php.net
Sounds like a bug, not a feature request, to me.
 [2015-01-09 00:11 UTC] ajf@php.net
-Type: Feature/Change Request +Type: Bug
 [2015-04-12 19:38 UTC] cmb@php.net
-Summary: similar_text parameters aren't convertible +Summary: similar_text parameters aren't swappable -Type: Bug +Type: Documentation Problem -Package: *General Issues +Package: Strings related -Operating System: linux - kernel 2.6 +Operating System: * -PHP Version: 4.4.2 +PHP Version: * -Assigned To: +Assigned To: cmb
 [2015-04-12 19:38 UTC] cmb@php.net
At first, it seems to be important to explain the algorithm. It is
finding the longest common substring of two strings, and then doing
this for the prefixes resp. the suffixes, recursively. The lengths
of the found common substrings are added and returned as result.
The percentage is calculated by multiplying the result with 200 and
dividing by the sum of the lenghts of both input strings.

Finding the longest common substring of two strings is accomplished
by the following algorithm:

    function longest_common_substring($a, $b) {
        $max = 0;
        for ($i = 0; $i < strlen($a); $i++) {
            for ($j = 0; $j < strlen($b); $j++) {
                $l = 0;
                while ($i + $l < strlen($a)
                       && $j + $l < strlen($b)
                       && $a[$i + $l] == $b[$j + $l]
                ) {
                    $l++;
                }
                if ($l > $max) {
                    $max = $l;
                }
            }
        }
        return $max;
    }

A noteworthy detail of the algorithm is that it always takes the
first longest common substring in either of the strings.

A simple example:
  similar_text('aabccc', 'aadccc') // => 5
  
The longest common substring is 'ccc' with a length of 3. None of
the input strings has a suffix after 'ccc', so only the prefixes
('aab' and 'aad') are compared. Their longest common substring is
'aa' (length 2); both don't have a prefix, so the suffixes 'b' and
'd' are compared, resulting in no common substring (length 0).
Adding up the lengths results in 5.

Another example:
  similar_text('michaelski', 'pankanin') // => 1
  
The first longest common substring that is found is 'i' (length 1),
so the prefixes are 'm' and 'pankan' and the suffixes are
'chaelski' and 'n'. The longest common substring of the prefixes is
'' (length 0), the longest common substring of the suffixes is ''
(length 0). So the result is 1.

Swapping the arguments has a different result, though:
  similar_text('pankanin', 'michaelski') // => 3
  
That is because the first longest common substring is 'a' (length
1), with the prefixes 'p' and 'mich' and the suffixes 'nkanin' and
'elski'. The prefixes have no common substring, but the suffixes
have 'k' (length 1), which has the prefixes 'n' and 'els' and the
suffixes 'anin' and 'i'. The prefixes have no common substring, but
the suffixes have 'i' (length 1). Therefore the result is 1.

One might argue that the algorithm is imperfect, but at least the
implementation is correct. It seems reasonable to document the
behavior of similar_text more clearly, therefore I'm changing to
"Documentation Problem".

> I used PHP 4.4.0 and 5.0.5 and got exactly the same results on
> both of them.

According to <http://3v4l.org/N6j98> the results under PHP 4.4.0
and 5.0.5 are the same than under most recent versions.

> Using LCS function (taken from comments on similar_text manual
> page)

This function does basically the same as longest_common_substring()
above, which is only part of the more complex similar_text()
algorithm.
 [2015-04-12 19:38 UTC] cmb@php.net
-Assigned To: cmb +Assigned To:
 [2018-02-22 17:42 UTC] cmb@php.net
Automatic comment from SVN on behalf of cmb
Revision: http://svn.php.net/viewvc/?view=revision&amp;revision=344329
Log: Fix bug #36261: similar_text parameters aren't swappable
 [2018-02-22 17:42 UTC] cmb@php.net
-Status: Open +Status: Closed -Assigned To: +Assigned To: cmb
 [2020-02-07 06:05 UTC] phpdocbot@php.net
Automatic comment on behalf of cmb
Revision: http://git.php.net/?p=doc/en.git;a=commit;h=d95d5914f5b9c244df7358e23cdfbb1b70cff1ec
Log: Fix bug #36261: similar_text parameters aren't swappable
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 19 06:01:29 2024 UTC