php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #59527 sdeep fuzzy_compare only returns 0
Submitted: 2010-12-02 17:23 UTC Modified: 2010-12-02 18:10 UTC
From: islam dot sharabash at gmail dot com Assigned:
Status: Not a bug Package: ssdeep (PECL)
PHP Version: 5.2.13 OS: centos 5
Private report: No CVE-ID: None
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please !
Your email address:
MUST BE VALID
Solve the problem:
44 + 50 = ?
Subscribe to this entry?

 
 [2010-12-02 17:23 UTC] islam dot sharabash at gmail dot com
Description:
------------
No matter the inputed string hashes fuzzy_compare only returns 
0

Reproduce code:
---------------
<?php

$string = "blah";
echo "blah";
?><br><?
var_dump($wee = ssdeep_fuzzy_hash($string));
?><br><?
var_dump(ssdeep_fuzzy_compare($wee, $wee));

?>


Expected result:
----------------
blah
string(7) "3:2n:2n"
int(100)

//Match of 100 is expected

Actual result:
--------------
blah
string(7) "3:2n:2n"
int(0) 

//Match of 0 returned, no matter the input

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2010-12-02 18:10 UTC] treffynnon@php.net
Thank you for taking the time to write to us, but this is not
a bug.

This a function of the ssdeep algorithm and not a bug. The author 
of the ssdeep upstream package has previously made reference to the 
algorithm becoming more accurate with content above 4KB in length.

It is suited for checking files or large strings against each other 
and not individual words such as "blah". A simple MD5 or SHA1 hash 
is designed for checking that two strings are identical. Context 
piecewise hashing is designed to show approximately how similar two 
large strings or files are to each other.

Please try running the tests that come with the php_ssdeep package 
or increase the length of your sample text. Whilst investigating 
your report I found that the following would return the result you 
expect:


php > $hash = 
ssdeep_fuzzy_hash('blahblahblahblahblahblahblahblahblahblahblahblah
blahblahfegfhgdhdghgdhgdshgsdhghgsdhgdshghsdhgdhgsjgdsjgdjgjgsgsghg
haasateytyuytkutdkusuht nmfbnzbnzbaerereyetywturyiuteutejbf 
najetjhr gtjidahoadfh aiohjda hipdj hhadphjfgpahjapeghut9euhiotejhi 
tjhe tjphjtejhgijdhkjhklghijst eih 
eapsjhpjtephjtpjhptjpihjtihjidasfjh dhj dpasiojh poeatojh ohj 
tpeojhpoaetjhoptejhoteajhotad 
jhpoeatjhpotejhpoitejbjgji9rtsbiprpbjtaephetnhjetapihjpet eh peoaj 
hpejpteajhegbmzcklhkghjgdj hhj thj teabnpteanmpaeotnmp[');
php > var_dump(ssdeep_fuzzy_compare($hash, $hash));                             
int(100)



Again though this extension is not intended to look for identical 
strings and you should be using SHA1 or MD5 hashes if you need to 
ensure they are the same. If you want to get a similarity match 
then ssdeep is the right way to go.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Mon Sep 16 06:01:27 2024 UTC