php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #44072 substr don't work correctly with binary string
Submitted: 2008-02-07 18:31 UTC Modified: 2008-03-03 01:00 UTC
Votes:4
Avg. Score:3.8 ± 0.8
Reproduced:3 of 3 (100.0%)
Same Version:1 (33.3%)
Same OS:1 (33.3%)
From: sergey89 at gmail dot com Assigned:
Status: No Feedback Package: Strings related
PHP Version: 5.2.5 OS: FreeBSD 6.3
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2008-02-07 18:31 UTC] sergey89 at gmail dot com
Description:
------------
substr don't work correctly with binary strings on FreeBSD 6.3, PHP 5.2.5. I have some binary file. When i tried to cut part of data i get incorrect result.

-----
mbstring.function_overload 0

Reproduce code:
---------------
<?php
$data = file_get_contents('data');
print md5($data) . ' | ';
print md5(substr($data, 0, -88));
?>

Expected result:
----------------
45e26dc33aad8e93f3f45c8d5100feb0 | 03d900cc2ba7276fb3bb3f1939303e3b

Actual result:
--------------
45e26dc33aad8e93f3f45c8d5100feb0 | d1cea9d93cb48b2d897595f5e96ba352

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-02-13 18:04 UTC] jani@php.net
What exactly IS in that file? It's impossible to test this without the data..
 [2008-02-14 09:38 UTC] sergey89 at gmail dot com
I generate data with simple PHP script:
<?php
file_put_contents('data', '');
for ($i = 0; $i < 1024; $i++){
    file_put_contents('data', chr(rand(0, 255)), FILE_APPEND);
}

However in php-cli substr work correctly.
 [2008-02-15 14:27 UTC] jani@php.net
If it works in CLI but not in webserver, then there's propably something different between those two, maybe PHP version..?*
 [2008-02-15 15:13 UTC] sergey89 at gmail dot com
The version and build are the same.
 [2008-02-18 10:51 UTC] jani@php.net
How do you know that the md5 sum for that part should be exactly that?
And we can't really test this reliably with some random data..
 [2008-02-19 15:21 UTC] sergey89 at gmail dot com
> How do you know that the md5 sum for that part should be exactly that?
I ran script on other PHP version and other OSes

PHP Script source: http://test.sergey89.net/test.php.txt
File containing data: http://test.sergey89.net/data
Result: http://test.sergey89.net/test.php

We can give you ftp/ssh access to test yourself.
 [2008-02-24 20:04 UTC] jani@php.net
Can you just provide the script and data in e.g. zip package somewhere where anyone can download it from? (those urls above are not working)

And shell access is not necessary for simple thing like this..especially when I'm 99% sure you're just doing something wrong.. :)
 [2008-03-03 01:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 [2008-06-05 11:38 UTC] yar_helg at mail dot ru
I have something like that on PHP 5.2.6 on Linux.
Here is a test script file:
===========================

<?php

echo "OS is ".PHP_OS."<br />\n";

echo "PHP is ".phpversion()."<br />\n";

echo "function overload = ".ini_get('mbstring.func_overload')."<br />\n";

//mb_internal_encoding('UTF-8');

echo "MB_INTERNAL_ENCODING =".mb_internal_encoding()."<br />\n";

define('IDENTIFIER_OLE', pack("CCCCCCCC",0xd0,0xcf,0x11,0xe0,0xa1,0xb1,0x1a,0xe1));

$data = file_get_contents($_SERVER['DOCUMENT_ROOT'].'/substr_bug/empty file.xls');

echo "Data length = ".strlen($data)."<br />\n";
echo "First 8 symbols  ==>".var_export(substr($data,0,8),1)."<== <br />\n";
echo "Compare result (substr(\$data,0,8)==IDENTIFIER_OLE) - ".var_export(substr($data,0,8)==IDENTIFIER_OLE,1)."<br />\n";
echo "Substring length (substr(\$data,0,8)) - ".strlen(substr($data,0,8))."<br />\n";

?>

Output:
=======
OS is Linux
PHP is 5.2.6
function overload = 0
MB_INTERNAL_ENCODING =ISO-8859-1
Data length = 13824
First 8 symbols ==>'&#1087;&#1086;&#1070;&#9553;&#9568;&#1040;'<==
Compare result (substr($data,0,8)==IDENTIFIER_OLE) - true
Substring length (substr($data,0,8)) - 8

But if you uncomment line with 
mb_internal_encoding('UTF-8');
output will be changed like that (look at the file size, result of substr and length of substr result)...

Output with mb_internal_encoding=UTF-8:
=======================================
OS is Linux
PHP is 5.2.6
function overload = 0
MB_INTERNAL_ENCODING =UTF-8
Data length = 13824
First 8 symbols ==>'&#1087;&#1086;&#1070;&#9553;&#9568;&#1040;' . "\0" . '' . "\0" . '' . "\0" . '' . "\0" . '' . "\0" . ''<==
Compare result (substr($data,0,8)==IDENTIFIER_OLE) - false
Substring length (substr($data,0,8)) - 13


mbstring.func_overload is set to 0 in .htaccess file in current dir.

"empty file.xls" is an empty MS Excel 2003 file. Can be downloaded from http://an-best.ru/empty_file.xls
 
PHP Copyright © 2001-2022 The PHP Group
All rights reserved.
Last updated: Mon May 23 23:05:47 2022 UTC