php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #62648 similar_text() returns wrong values
Submitted: 2012-07-24 06:17 UTC Modified: 2015-04-12 19:45 UTC
Votes:8
Avg. Score:4.6 ± 0.7
Reproduced:8 of 8 (100.0%)
Same Version:3 (37.5%)
Same OS:5 (62.5%)
From: normadize at gmail dot com Assigned: cmb (profile)
Status: Not a bug Package: Strings related
PHP Version: 5.3.15 OS: Linux, Win32
Private report: No CVE-ID: None
 [2012-07-24 06:17 UTC] normadize at gmail dot com
Description:
------------
similar_text() returns wrong values, see below for reproducible error.

This is actually a very old bug ... would be nice if someone finally fixed it.

Test script:
---------------
echo similar_text("test","wert");
echo "|";
echo similar_text("wert","test");


Expected result:
----------------
2|2

Actual result:
--------------
1|2

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2012-07-29 05:48 UTC] normadize at gmail dot com
Note that besides giving different results when swapping arguments, it gives wrong results sometimes completely. Here's an example:

<?php

$s1 = 'biocompatibilitystudiesofironoxidedextrinthinfilmsbacdetundgakslfnnd';
$s2 = 'studyofironoxidenanoparticlescoatedwithdextrinobtainedbybcgagsfddteuhdgavdjfkdbey';

echo similar_text($s1, $s2);
echo ",";
echo similar_text($s2, $s1);

?>

Gives 34,32 instead of 35,35. 

Longest common substring is "studofironoxidedextrintinbcdtudgafd" which is 35 chars long.

I have written a small custom PHP extension that I compile and load in all my PHP deployments for speed because a PHP implementation of the longest common substring algorithm in PHP is painfully slow. My extension is based on suffix trees rather than recursive calls, which is generally faster for short to moderately long strings.

Isn't any dev interested in fixing this very useful function? It's been such a long time ...

I can post my extension code and will attempt to patch the original similar_text() too.
 [2012-09-05 06:11 UTC] carloschilazo at gmail dot com
the only thing that I'd want to clear out first about this, is the performance of 
your algorithm vs the built-in function.

Could you please provide the test case you are using to measure this?

Thanks!
 [2015-04-12 19:45 UTC] cmb@php.net
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2015-04-12 19:45 UTC] cmb@php.net
Please do not submit the same bug more than once. An existing
bug report already describes this very problem. Even if you feel
that your issue is somewhat different, the resolution is likely
to be the same. 

Thank you for your interest in PHP.

This issue had already been reported as bug #36261. The fact that
swapping the arguments might give different results is not an
indication that there is a bug. Consider $a-$b vs. $b-$a and see
the comprehensive explanation in the former bug report.

> Longest common substring is "studofironoxidedextrintinbcdtudgafd"
> which is 35 chars long.

Note that the longest common substring usually is understood as a
contiguous sequence of characters, such that

  strpos($str, $substr) !== false
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Nov 24 23:01:32 2024 UTC