php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #41409 PHP does not process hexadecimal strings in a consistent manner.
Submitted: 2007-05-16 12:18 UTC Modified: 2015-02-08 10:22 UTC
Votes:7
Avg. Score:4.3 ± 0.9
Reproduced:5 of 5 (100.0%)
Same Version:1 (20.0%)
Same OS:0 (0.0%)
From: wharmby at uk dot ibm dot com Assigned: nikic (profile)
Status: Closed Package: *General Issues
PHP Version: 7.0 OS: Irrelevant
Private report: No CVE-ID: None
 [2007-05-16 12:18 UTC] wharmby at uk dot ibm dot com
Description:
------------
PHP does not process hexadecimal strings in a consistent manner or
document when a hexadecimal string is evaluated as a numeric value and 
when it's not.

Whilst developing new PHPT testcase colleagues of mine have noticed
that PHP does not process hexadecimal strings in a consistent manner 
which makes it more difficult to determine what its expected and what 
is unexpected behaviour.

We have reviewed the documentation in the PHP manual on the subject at
http://www.php.net/manual/en/language.types.string.php and there is no
explicit mention here of support for hexadecimal strings but it does
contain the reference

   "For more information on this conversion, see the Unix manual page for strtod(3). "

The MSDN description of strtod makes no reference to treatment of hexadecimal strings but the Linux manual does. 

The following simple testcase demonstrates the inconsistency we are 
seeing and is based on a  testcase in a comment by "wilkinson98 at
hotmail dot com" on the manual back in 2005.

If we add to hexadecimal strings together the  strings are first 
converted to numbers by calling zendi_convert_scalar_to_number() which
calls  zend_bool is_numeric_string() to convert the strings which in 
turn calls the CRT function strtol() to do the conversion. In this 
case strtol() is called with a base of 10 or 16 dependent on whether 
or not the string starts with "0x" 

i.e the code in zend_bool is_numeric_string() reads:

    /* handle hex numbers */
    if (length>=2 && str[0]=='0' && (str[1]=='x' || str[1]=='X')) {
            conv_base=16;
    }
    errno=0;
    local_lval = strtol(str, &end_ptr_long, conv_base);

So hexadecimal strings are evaluated as numerical values and the 
addition produces the expected result. Or at least it does provided 
the string does not start with any white space, e.g "   0x123". Any white space causes a string to evaluate to 0 which again is not what 
is suggested should happen from reading the description of strtod as suggested by the PHP manual;  strtod description says white is ignored. So that needs clarifying too.

If we first cast the strings to "int" before adding them however, the
cast opcode handler calls convert_to_long()which calls convert_to_long_base() with a base of 10 which in turn calls strtol()
with base 10. As a result the hexadecimal strings both evaluate to 0 
and the result is 0.  The convert_to_long() function could easily be 
changed to employ similar logic to above to set the base dependent on 
the first 2 characters of the string but as convert_to_long() is called by a number of extension functions to convert 
passed arguments, e.g range() this means that these functions will 
also have their behaviour changed too. Whilst this would be good form the point of view of consistency it may break existing applications. 

PHP either needs to be changed to either consistently evaluate hexadecimal strings as numerical values when appropriate (and I 
believe casting a string to an int is a case where an hexadecimal
string should be evaluated as numerical value) or document clearly 
when a hexadecimal string will be evaluated as a numeric value and 
when it's not. 

I am raising this defect to flag the issue and get feed back on whether the code or manual should be fixed here. I would prefer to see
the code fixed but accept that doing so may break existing applications and so all that can be done is to document the behaviour.
Either way I am happy to work with my colleagues developing the new PHPT testcases to determine which code or manual pages need fixing.


Reproduce code:
---------------
<?php

$a = "0x32";
$b = "0x64";

echo $a + $b, "\n";
echo (int)$a + (int)$b, "\n";

?>

Expected result:
----------------
150
150

Actual result:
--------------
150
0

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2014-05-01 02:22 UTC] levim@php.net
-Package: Feature/Change Request +Package: *General Issues -Operating System: Windows XP +Operating System: Irrelevant -PHP Version: 5CVS-2007-05-16 (CVS) +PHP Version: 4.3.0
 [2014-05-01 02:22 UTC] levim@php.net
Using it as an int is different than casting it as an an int yields different results? Seems like a bug to me unless there is some behavior rule I'm not thinking of.

I ran your reproduce code through http://3v4l.org/gZ1Z7 and it gives consistent results from 4.3.0 to 5.6.0beta1.
 [2014-05-01 02:25 UTC] levim@php.net
I clearly was too tired when I hit submit on my last comment; here it is cleaned up:

Using it as an int should probably have the same behavior as casting it to an int unless there is some behavior rule I'm not thinking of.
 [2014-12-30 21:13 UTC] tyrael@php.net
-PHP Version: 4.3.0 +PHP Version: 7.0
 [2015-01-09 00:48 UTC] ajf@php.net
This would be resolved by this RFC: https://wiki.php.net/rfc/remove_hex_support_in_numeric_strings
 [2015-01-09 07:18 UTC] nikic@php.net
-Assigned To: +Assigned To: nikic
 [2015-02-08 10:22 UTC] nikic@php.net
-Status: Assigned +Status: Closed
 [2015-02-08 10:22 UTC] nikic@php.net
Fixed in PHP 7, by removing hex-string support in the cases where it existed.
 
PHP Copyright © 2001-2019 The PHP Group
All rights reserved.
Last updated: Mon Sep 23 17:01:26 2019 UTC