php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #42101 mb_substr() misbehaves when length = PHP_INT_MAX (64bit issue)
Submitted: 2007-07-25 12:10 UTC Modified: 2008-02-27 14:28 UTC
Votes:2
Avg. Score:3.5 ± 0.5
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:1 (100.0%)
From: mcorne at yahoo dot com Assigned: hirokawa (profile)
Status: No Feedback Package: mbstring related
PHP Version: 5.2.4RC2-dev OS: Linux x86-64
Private report: No CVE-ID: None
 [2007-07-25 12:10 UTC] mcorne at yahoo dot com
Description:
------------
mb_substr("\x44\xCC\x87", 0, PHP_INT_MAX, 'UTF-8') only captures the first character on linux 64-bit instead of returning the whole string.
Note that this works fine on Windows XP and Linux 32-bit.

Reproduce code:
---------------
function substring($string, $length)
{
    $substr = mb_substr($string, 0, $length , 'UTF-8');
    $length = strlen($substr);
    $chars = $length? unpack("C{$length}chars", $substr) : array();
    $decs = array_map('dechex', $chars);
    return array($substr, $decs);
}

$test['string'] = "\x44\xCC\x87";
$test['utf8'] = '\x44\xCC\x87';
$test['unicode'] = '\u0044\u0307';
$test['PHP_INT_MAX'] = PHP_INT_MAX;
$test['php_int_max'] = substring($test['string'], PHP_INT_MAX);
$test['9999'] = substring($test['string'], 9999);

print_r($test);


Expected result:
----------------
Array
(
    [string] => Ḋ
    [utf8] => \x44\xCC\x87
    [unicode] => \u0044\u0307
    [PHP_INT_MAX] => 2147483647
    [php_int_max] => Array
        (
            [0] => Ḋ
            [1] => Array
                (
                    [chars1] => 44
                    [chars2] => cc
                    [chars3] => 87
                )

        )

    [9999] => Array
        (
            [0] => Ḋ
            [1] => Array
                (
                    [chars1] => 44
                    [chars2] => cc
                    [chars3] => 87
                )

        )

)

Actual result:
--------------
Array
(
    [string] => Ḋ
    [utf8] => \x44\xCC\x87
    [unicode] => \u0044\u0307
    [PHP_INT_MAX] => 2147483647
    [php_int_max] => Array
        (
            [0] => D
            [1] => Array
                (
                    [chars1] => 44
                )

        )

    [9999] => Array
        (
            [0] => Ḋ
            [1] => Array
                (
                    [chars1] => 44
                    [chars2] => cc
                    [chars3] => 87
                )

        )

)

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-08-15 06:45 UTC] mcorne at yahoo dot com
Same issue on the latest release.
Test done on:
PHP Version => 5.2.4RC2-dev
System => Linux durbatuluk 2.6.20-16-generic #2 SMP Thu Jun 7 19:00:28 UTC 2007 x86_64
Build Date => Aug 13 2007 21:59:11
 [2007-08-17 13:49 UTC] jani@php.net
Assigned to the maintainer of mbstring extension.
 [2007-09-22 01:32 UTC] hirokawa@php.net
I reproduced the same issue with mb_substr() on my Athlon 64/x2 machine.

I believe that substr() is also has the same 64bit issue.

It is a sample script (tested on my x86/64 Ubuntu Linux, Athlon 64x2)
<?php
 echo substr("\x44\xCC\x87", 0, 1024);          // output: 0x44,0xcc,0x87
 echo substr("\x44\xCC\x87", 0, PHP_INT_MAX);   // output: 0x44,0xcc
 echo substr("\x44\xCC\x87", 0, PHP_INT_MAX-1); // output: 0x44
 echo substr("\x44\xCC\x87", 0, PHP_INT_MAX-2); // output: 
?>

I think PHP itself is not 64bit compatible.
Why didn't you submit a bug report for substr() ?


 [2008-02-27 14:28 UTC] hirokawa@php.net
No feedback was provided. The bug is being suspended because
we assume that you are no longer experiencing the problem.
If this is not the case and you are able to provide the
information that was requested earlier, please do so and
change the status of the bug back to "Open". Thank you.


 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 07 13:01:28 2024 UTC