PHP :: Bug #62648 :: similar_text() returns wrong values

Bug #62648

similar_text() returns wrong values

Submitted:

2012-07-24 06:17 UTC

Modified:

2015-04-12 19:45 UTC

Votes:	8
Avg. Score:	4.6 ± 0.7
Reproduced:	8 of 8 (100.0%)
Same Version:	3 (37.5%)
Same OS:	5 (62.5%)

From:

normadize at gmail dot com

Assigned:

cmb (profile)

Status:

Not a bug

Package:

Strings related

PHP Version:

5.3.15

OS:

Linux, Win32

Private report:

CVE-ID:

None

View Developer Edit

Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.

Password:

Status:
Package:
Bug Type:
Summary:
From:	normadize at gmail dot com
New email:
PHP Version:		OS:

New Comment:

[2012-07-24 06:17 UTC] normadize at gmail dot com

Description:
------------
similar_text() returns wrong values, see below for reproducible error.

This is actually a very old bug ... would be nice if someone finally fixed it.

Test script:
---------------
echo similar_text("test","wert");
echo "|";
echo similar_text("wert","test");


Expected result:
----------------
2|2

Actual result:
--------------
1|2

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports

[2012-07-29 05:48 UTC] normadize at gmail dot com

Note that besides giving different results when swapping arguments, it gives wrong results sometimes completely. Here's an example:

<?php

$s1 = 'biocompatibilitystudiesofironoxidedextrinthinfilmsbacdetundgakslfnnd';
$s2 = 'studyofironoxidenanoparticlescoatedwithdextrinobtainedbybcgagsfddteuhdgavdjfkdbey';

echo similar_text($s1, $s2);
echo ",";
echo similar_text($s2, $s1);

?>

Gives 34,32 instead of 35,35. 

Longest common substring is "studofironoxidedextrintinbcdtudgafd" which is 35 chars long.

I have written a small custom PHP extension that I compile and load in all my PHP deployments for speed because a PHP implementation of the longest common substring algorithm in PHP is painfully slow. My extension is based on suffix trees rather than recursive calls, which is generally faster for short to moderately long strings.

Isn't any dev interested in fixing this very useful function? It's been such a long time ...

I can post my extension code and will attempt to patch the original similar_text() too.

[2012-09-05 06:11 UTC] carloschilazo at gmail dot com

the only thing that I'd want to clear out first about this, is the performance of 
your algorithm vs the built-in function.

Could you please provide the test case you are using to measure this?

Thanks!

[2015-04-12 19:45 UTC] cmb@php.net

-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb

[2015-04-12 19:45 UTC] cmb@php.net

Please do not submit the same bug more than once. An existing
bug report already describes this very problem. Even if you feel
that your issue is somewhat different, the resolution is likely
to be the same. 

Thank you for your interest in PHP.

This issue had already been reported as bug #36261. The fact that
swapping the arguments might give different results is not an
indication that there is a bug. Consider $a-$b vs. $b-$a and see
the comprehensive explanation in the former bug report.

> Longest common substring is "studofironoxidedextrintinbcdtudgafd"
> which is 35 chars long.

Note that the longest common substring usually is understood as a
contiguous sequence of characters, such that

  strpos($str, $substr) !== false

	php.net \| support \| documentation \| report a bug \| advanced search \| search howto \| statistics \| random bug \| login
go to bug id or search bugs for


Copyright © 2001-2025 The PHP Group All rights reserved.	Last updated: Sat Jul 12 02:01:35 2025 UTC