php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #61660 bin2hex(hex2bin($data)) != $data
Submitted: 2012-04-07 15:46 UTC Modified: 2012-06-15 00:58 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: krtek4+php at gmail dot com Assigned: nikic (profile)
Status: Closed Package: *General Issues
PHP Version: 5.4.1RC1 OS: Debian Linux
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: krtek4+php at gmail dot com
New email:
PHP Version: OS:

 

 [2012-04-07 15:46 UTC] krtek4+php at gmail dot com
Description:
------------
If you try to apply bin2hex on the result of hex2bin, when the length of the 
initial data is an odd number, the resulting data are not the same as the 
original.

$ php -v
PHP 5.4.1RC1 (cli) (built: Apr  6 2012 13:31:16) 
Copyright (c) 1997-2012 The PHP Group
Zend Engine v2.4.0, Copyright (c) 1998-2012 Zend Technologies

This is the original Debian package present in sid on the 7th April 2012




Test script:
---------------
<?php

$data = '12345';
var_dump($data, bin2hex(hex2bin($data)));
// outputs :
// string(5) "12345"
// string(4) "1234"

Expected result:
----------------
The output should be the same. (ie 12345 both times)

Actual result:
--------------
string(5) "12345"
string(4) "1234"


Patches

hex2bin_padd_odd_new_patch (last revision 2012-04-08 09:18 UTC by theanomaly dot is at gmail dot com)
hex2bin_padd_odd (last revision 2012-04-08 04:33 UTC by theanomaly dot is at gmail dot com)
hex2bin_pad_odd_length.patch (last revision 2012-04-07 17:02 UTC by krtek4+php at gmail dot com)

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2012-04-07 16:45 UTC] krtek4+php at gmail dot com
I'm aware that this is a problem with the internal reprensation of the binary 
value which has to be aligned on 8 bits.

But with the actual implementation, we are losing informations.

A possible solution would be to pad the binary data with 0 on the left and when 
converting back again to hex, remove the leading 0s.

At least, something should be said in the documentation about this shortcoming.
 [2012-04-07 16:46 UTC] laruence@php.net
I think it's better to well document this.  or,  add a prepend '0' , I will make a 
patch for this.
 [2012-04-07 17:04 UTC] laruence@php.net
-Package: Unknown/Other Function +Package: Documentation problem
 [2012-04-07 17:04 UTC] laruence@php.net
actually, I think I can not make a good patch for this, since 0* will be considerd 
as a oct number...

and if I pad some magic number like '0f' to it, there will be a mess if you pass 
the result to somewhere not bin2hex..

so, mark this as doc problem
 [2012-04-07 17:12 UTC] krtek4+php at gmail dot com
What about the patch I just sent ?

I'm not sure how it will behave when converting to octal afterward, but at least 
we don't loose the last hexadecimal number, so I think it's an improvement over 
the actual situation.
 [2012-04-08 04:38 UTC] theanomaly dot is at gmail dot com
I've also submitted a patch which seems to work fine as far as I've tested it. 
Similar to the previous patch, but mine simply prepends a 0 to the beginning of 
every odd length string sent to hex2bin. It shouldn't break anything that I can 
see. If anything it just ensures we always have a valid hexadecimal 
representation since the implementation relies on shift 1 to translate between 
hex and binary (we'll always be one off). I suggest this also becomes a part of 
the dec2bin implementation since it would make sense to return a full octet in 
that representation as well. Sorry if my patch looks ugly though. This is my 
first attempt at a patch.
 [2012-04-08 08:07 UTC] laruence@php.net
@theanomaly  I have tried the similar way as you did.  but the key problem is the 
result will be considered as a oct number.
reads:
"123" != "0123"
 [2012-04-08 09:42 UTC] theanomaly dot is at gmail dot com
@laruence

I've replaced the last patch with a better patch because I realized I created a 
memory leak and that was a poor strategy.

I can't understand why there should be any confusion about whether it's an octal 
value or a hexadecimal value though. Since when should using bin2hex() ever 
leave us with the expectation that would _ever_ get back an octal value?

I might be missing something here, but hex2bin() should always be expecting a 
hexadecimal value and bin2hex() should always leave us with the expectation of a 
hexadecimal value. I see nothing wrong with padding the value to an even number 
otherwise the result is hex2bin() isn't doing what it's supposed to be doing. It 
makes sense to me that even if the client sends a value of '1' that it's 
completely expected behavior that '01' and '1' should both be a valid 
hexadecimal value.

To me it just makes no sense to punish the client for forgetting to pad the 
value by returning false data. At the very least we should be issuing a warning 
to let the client know they have sent unexpected data and then this can be 
documented behavior. But why waste time fixing it to issue E_WARNINGs when this 
patch fixes the issue completely? Besides hex2bin is returning a string. It's 
not like the user can inadvertently use it as an octal value.

var_dump('0123' + '0123'); // int(246)

This would be silly not to fix in my opinion. Especially since it's such an easy 
fix. At least run the patch and let me know which test case you can come up with 
that would break any of PHP's already existing documented behavior by making 
this modification?
 [2012-04-08 10:00 UTC] krtek4+php at gmail dot com
The internal representation must always be aligned on 8 bits, thus we have no 
choice to pad with 0 bits at the beginning, 00001000 and 1000 is the exact same 
value in binary and I think the actual patch is correct.

The new problem is that the reverse operation, i.e. bin2hex, should remove the 
added 0 bit at the beginning.

@theanomaly ; decbin works just fine since it returns a string composed of 0s 
and 1s and not a "binary value". hex2bin / bin2hex are the only function I'm 
aware of working this way.

BTW, why did you sent another patch ? mine is doing exactly the same as yours 
and is working fine.
 [2012-04-08 10:01 UTC] laruence@php.net
@theanomaly as we talked, since Rasmus said it's okey, then I have no objection.
 2 suggestions :

1. use oddlen & 1 instead of oddlen % 2
2. since you cal oddlen % 2 twice, so it's better if you can use a var to hold it 

and it's better for you to make a pull request by yourself, let me know if you 
need help on that :)
 [2012-04-08 10:02 UTC] laruence@php.net
-Package: Documentation problem +Package: *General Issues
 [2012-04-08 10:08 UTC] krtek4+php at gmail dot com
If I could intervene, my patch does exactly the same thing without adding a new 
test for each iteration of the loop.

Mine has only one modulo (%) outside of the loop and the new test is executed only 
once at the first pass assuming the string is a correct hexadecimal value.

Like said in your last comment, the % operation can even be optimized with a '& 1' 
if needed.
 [2012-04-08 12:23 UTC] theanomaly dot is at gmail dot com
@krtek4+php

I didn't mean to step on any toes, honestly. I think your patch probably looks 
way cleaner than mine, but when I tried compiling your patch it did not work for 
me. The test didn't pass.

var_dump(bin2hex(hex2bin(1))); // returned string(0) ""

Maybe I didn't do it right, but that's the only reason I submitted another patch 
after I tested again on the PHP-5.4 branch.
 [2012-04-08 13:20 UTC] krtek4+php at gmail dot com
You are right, there was a bug in the patch I sent. I updated it on github.

Thanks for the comment !
 [2012-04-08 15:25 UTC] nikic@php.net
In my eyes hex2bin should not try to fix the corrupt data by padding it with zero. Instead it should throw a warning.

The reason is simple: A padding can be either added on the left or on the right. E.g. "af52b" could be interpreted both as "0af52b" and as "af52b0".

Both are valid interpretations, depending on the use case.

Because of this I think a warning is more appropriate. It signals the user that the data is wrong. He can always easily fix it by adding a zero to either of the sides, whatever is more appropriate in his particular situation.

Blindly adding the padding on the other hand will probably be quite unexpected. Especially when adding the 0 on the left side all the data will be shifted by 4 bits. This means that the whole data will be corrupted. All bytes will change. (Adding a padding on the right is less intrusive as it only changes one byte).
 [2012-04-08 16:05 UTC] rasmus@php.net
> The reason is simple: A padding can be either added on the left or on the 
right. E.g. "af52b" could be interpreted both as "0af52b" and as "af52b0".

Why would af52b ever mean af52b0 ? That's like saying 15 could be interpreted as 
150 sometimes.
 [2012-04-08 16:23 UTC] theanomaly dot is at gmail dot com
We have no problem type juggling a string to an int as the first non-whitespace, 
non-zero number character...

var_dump(1 + "\t\r\n 0001"); // int(2)

Then, equally, we should have no problem interpreting a hexadecimal 
representation of 1 as 01.

:)
 [2012-04-08 20:45 UTC] nikic@php.net
Automatic comment on behalf of nikic
Revision: http://git.php.net/?p=php-src.git;a=commit;h=7ae93a2c4c8a51cc2aec9977ce3c83c100e382a0
Log: Fix bug #61660: bin2hex(hex2bin($data)) != $data
 [2012-04-08 20:51 UTC] nikic@php.net
-Status: Open +Status: Closed -Assigned To: +Assigned To: nikic
 [2012-04-08 20:51 UTC] nikic@php.net
After some further discussion on IRC we agreed that it is best to throw a warning. The reasoning is as outlined in my previous comment.

The warning is implemented with https://github.com/php/php-src/commit/7ae93a2c4c8a51cc2aec9977ce3c83c100e382a0.
 [2012-04-11 06:13 UTC] theanomaly dot is at gmail dot com
The patch currently in master still leaves a remanent problem. First, 
var_dump(hex2bin('')) still returns string(0) "". Whether or not this is 
acceptable is arguable. Since 0x00 is null, but an empty string isn't really 
null. 

For example:
var_dump(unpack('H*',hex2bin(''))); // will give me string(0) ""

Whereas:

var_dump(unpack('H*',hex2bin('00'))); // will give me string(2) ""

and:

var_dump(hex2bin('00')); // will give me string(1) ""  # which is what I expect

So the warning needs to be amended for an empty string as well as odd-sized hex.

Additionally, the function now returns a bool(false), breaking BC.

I suggest a second optional argument to return partial binary (potentially 
broken binary) even on error in the event the user wants to maintain backwards 
compatibility. I'll be happy to submit another patch for this if there are no 
objections.
 [2012-04-26 21:06 UTC] stas@php.net
-Status: Closed +Status: Re-Opened
 [2012-04-26 21:06 UTC] stas@php.net
I do not think BC break in 5.4 is a good idea.
 [2012-04-26 21:22 UTC] nikic@php.net
@Stas: Just to be sure we are on the same page:
a) This does not break BC in any significant way. Using the function with odd length was wrong from the start. It definitely breaks less than other changes that landed to all branches, like the handling of invalid multibyte sequences in json_encode.
b) If we don't change it now, it will be hard to do so later. PHP 5.4 isn't used much yet, so people didn't have much chance to misuse this function. This may change in the future.
c) From hour long discussions on IRC it turns out that this is largely a documentation problem. Most people involved in the discussion thought that hex2bin() converts hexadecimal numbers to binary numbers (which is wrong). This has now be clarified in the docs: http://de3.php.net/hex2bin

If you wish so, I will revert the change. But I will spend no more time discussing this. I already spent several hours in discussions about this rather unimportant issue.

Thanks,
Nikita.
 [2012-05-06 22:55 UTC] personal_homepage_tools at hochsitze dot com
This is hardly believable. This bug is so much obvious that it hurts to read the comment "The reason is simple: A padding can be either added on the left or on the right. E.g. "af52b" could be interpreted both as "0af52b" and as "af52b0".". Anybody writing this must have failed in CS101 (hard!) - how is it working that persons who write this still obviously work for this company? 
I know it is hard to get something as old as PHP in a functional, future-oriented state, but as long as comments like this from official @php.net accounts are not deleted immediately this will not happen IMHO.
 [2012-05-06 23:02 UTC] nikic@php.net
@personal_homepage_tools at hochsitze dot com:

I'm not sure I understand you. What's the issue with that comment? Do you disagree that a padding can be added to either sides?
 [2012-05-06 23:03 UTC] personal_homepage_tools at hochsitze dot com
Just to add something constructive: obviously bin2hex must return a valid hex string, i.e. with a leading zero if the leading byte consumes less than 4 bits.
*facepalm*
 [2012-05-06 23:12 UTC] nikic@php.net
bin2hex already now returns a valid hex string. This bug report is about the inverse function hex2bin, which converts a hexadecimally encoded binary string back into a binary string. (Please note that we are not talking about NUMBERs here, but about STRINGs.)

So I'm not quite sure what the problem is there.
 [2012-06-15 00:58 UTC] nikic@php.net
-Status: Re-Opened +Status: Closed
 [2012-06-15 00:58 UTC] nikic@php.net
The change has found it's way into PHP 5.4.4, so closing this bug report.
 [2014-10-07 23:27 UTC] stas@php.net
Automatic comment on behalf of nikic
Revision: http://git.php.net/?p=php-src-security.git;a=commit;h=7ae93a2c4c8a51cc2aec9977ce3c83c100e382a0
Log: Fix bug #61660: bin2hex(hex2bin($data)) != $data
 [2014-10-07 23:38 UTC] stas@php.net
Automatic comment on behalf of nikic
Revision: http://git.php.net/?p=php-src-security.git;a=commit;h=7ae93a2c4c8a51cc2aec9977ce3c83c100e382a0
Log: Fix bug #61660: bin2hex(hex2bin($data)) != $data
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 11:01:29 2024 UTC