php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #31188 metaphone function returns incorrect results
Submitted: 2004-12-19 22:36 UTC Modified: 2004-12-20 08:47 UTC
From: travis at dreamsage dot com Assigned:
Status: Not a bug Package: Strings related
PHP Version: 4.3.9 OS: Windows XP (linux untested)
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please !
Your email address:
MUST BE VALID
Solve the problem:
30 + 9 = ?
Subscribe to this entry?

 
 [2004-12-19 22:36 UTC] travis at dreamsage dot com
Description:
------------
Hello and thanks for taking the time to read this. You guys (and gals) are really appriciated.

This bug has to do with the metaphone function. This function turns a word into string that represents the sound of the word.

I was recently required to write a VB version of this function because none existed and so I translated the original Basic version of this function written around 1990.
http://aspell.net/metaphone/metaphone-kuhn.txt

I wanted to check my code so I compared about 10,000 words produced by my VB code to the output of PHP and I found 4 bugs in your code. The output of the PHP function does not match what the output of the above algotythm states. 

1: Words with GH
2: words with MB. The B is silent at the end of a word "dumb" but not in the middle "camber"
3: words with GN
4 words with CIA


Thanks
Travis Apple


Reproduce code:
---------------
.

Expected result:
----------------
.

Actual result:
--------------
.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2004-12-19 23:00 UTC] jed@php.net
Sorry, but your problem does not imply a bug in PHP itself.  For a
list of more appropriate places to ask for help using PHP, please
visit http://www.php.net/support.php as this bug system is not the
appropriate forum for asking support questions. 

Thank you for your interest in PHP.

Perhaps your VB is wrong? We can\'t do anything about this without a lot more information. 
 [2004-12-19 23:37 UTC] travis at dreamsage dot com
I have checked my VB against results from the original Basic version, a C version as well as a commonly used Java version and the PHP version gives different results than any of these. You have a bug here. Please don't ignore this. The metaphone code commonly runs spellchecks for many MANY websites. The code that controls this is only a couple hundred lines long and so it shouldn't take long to fix.

Travis
 [2004-12-20 00:28 UTC] jed@php.net
I bogused you because it sounded like a simple "VB does this right, PHP doesn't", which doesn't really say anything. You're serious, though, so I apologize for the quick bogus.

We can't fix this without a lot more information. We need a valid test case to examine a specific fault before we can even move on this. Simply saying "some of these are wrong" doesn't help us at all, and as a result, we can't help you. 

You gave us some letters that confuse our metaphone, so let's work with that -- do you have a specific word that is metaphoned differently? The implementation, in PHP 5 at least, is based on CPAN's "Text-Metaphone-1.96" by Michael G Schwern according to the source.

Thanks again.
 [2004-12-20 00:34 UTC] derick@php.net
Our documentation (http://nl.php.net/metaphone) states that " Metaphone was developed by Lawrence Philips <lphilips at verity dot com>. It is described in ["Practical Algorithms for Programmers", Binstock & Rex, Addison Wesley, 1995]. " so your  "original" basic code can not have anything to do with the implementation we use. Therefore this is not a bug, just a slightly different algorithm.
 [2004-12-20 00:55 UTC] travis at dreamsage dot com
if the code has been rewritten for version 5 then I don't know if there's a problem in that version. I have only tested 4.3 where there is a problem. Also, I certainly don't think VB is better than PHP. It's for my job and sometimes they make me do things that I don't want to do.

anyway, I ran through the "a"s and found some that should describe the problem. Listed is the word then the php string then what it should be.

the g should be silent
Agnail - AKNL - ANL

the b should not be silent
Alembic - ALMK - ALMB

the h should be silent
Afghan - AFFN - AFKN

the x is misplaced
Ascian - ASXN - ASN

Also, PHP has a problem returning more than the number of chars you ask it to return on very rare occasions. If I ask it to return 4 chars it will return 5 every maybe 400 times.

And you're right, the original was written in 95 and not 90. My mistake there, I was writing from memory.

this is the java version and seems to work nicely to test on.

http://www.wbrogden.com/phonetic/index.html

Thanks 
Travis
 [2004-12-20 08:47 UTC] amt@php.net
From reading Schwern's man page, it appears that there was 
an original BASIC version from 1990 and then a C version in 
1995. It also appears that Schwern made a few custom 
improvements on the original algorithm. I don't know which 
version made it into PHP.

http://search.cpan.org/~mschwern/Text-Metaphone-1.96/
Metaphone.pm

If you can demonstrate that somehow the PHP version is 
different than the Perl version, then that would probably be 
actionable, but outside of that, I don't think anyone is 
going to look into the algorithm.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 19 06:01:29 2024 UTC