go to bug id or search bugs for
If you try to apply bin2hex on the result of hex2bin, when the length of the
initial data is an odd number, the resulting data are not the same as the
$ php -v
PHP 5.4.1RC1 (cli) (built: Apr 6 2012 13:31:16)
Copyright (c) 1997-2012 The PHP Group
Zend Engine v2.4.0, Copyright (c) 1998-2012 Zend Technologies
This is the original Debian package present in sid on the 7th April 2012
$data = '12345';
// outputs :
// string(5) "12345"
// string(4) "1234"
The output should be the same. (ie 12345 both times)
Add a Patch
Add a Pull Request
I'm aware that this is a problem with the internal reprensation of the binary
value which has to be aligned on 8 bits.
But with the actual implementation, we are losing informations.
A possible solution would be to pad the binary data with 0 on the left and when
converting back again to hex, remove the leading 0s.
At least, something should be said in the documentation about this shortcoming.
I think it's better to well document this. or, add a prepend '0' , I will make a
patch for this.
actually, I think I can not make a good patch for this, since 0* will be considerd
as a oct number...
and if I pad some magic number like '0f' to it, there will be a mess if you pass
the result to somewhere not bin2hex..
so, mark this as doc problem
What about the patch I just sent ?
I'm not sure how it will behave when converting to octal afterward, but at least
we don't loose the last hexadecimal number, so I think it's an improvement over
the actual situation.
I've also submitted a patch which seems to work fine as far as I've tested it.
Similar to the previous patch, but mine simply prepends a 0 to the beginning of
every odd length string sent to hex2bin. It shouldn't break anything that I can
see. If anything it just ensures we always have a valid hexadecimal
representation since the implementation relies on shift 1 to translate between
hex and binary (we'll always be one off). I suggest this also becomes a part of
the dec2bin implementation since it would make sense to return a full octet in
that representation as well. Sorry if my patch looks ugly though. This is my
first attempt at a patch.
@theanomaly I have tried the similar way as you did. but the key problem is the
result will be considered as a oct number.
"123" != "0123"
I've replaced the last patch with a better patch because I realized I created a
memory leak and that was a poor strategy.
I can't understand why there should be any confusion about whether it's an octal
value or a hexadecimal value though. Since when should using bin2hex() ever
leave us with the expectation that would _ever_ get back an octal value?
I might be missing something here, but hex2bin() should always be expecting a
hexadecimal value and bin2hex() should always leave us with the expectation of a
hexadecimal value. I see nothing wrong with padding the value to an even number
otherwise the result is hex2bin() isn't doing what it's supposed to be doing. It
makes sense to me that even if the client sends a value of '1' that it's
completely expected behavior that '01' and '1' should both be a valid
To me it just makes no sense to punish the client for forgetting to pad the
value by returning false data. At the very least we should be issuing a warning
to let the client know they have sent unexpected data and then this can be
documented behavior. But why waste time fixing it to issue E_WARNINGs when this
patch fixes the issue completely? Besides hex2bin is returning a string. It's
not like the user can inadvertently use it as an octal value.
var_dump('0123' + '0123'); // int(246)
This would be silly not to fix in my opinion. Especially since it's such an easy
fix. At least run the patch and let me know which test case you can come up with
that would break any of PHP's already existing documented behavior by making
The internal representation must always be aligned on 8 bits, thus we have no
choice to pad with 0 bits at the beginning, 00001000 and 1000 is the exact same
value in binary and I think the actual patch is correct.
The new problem is that the reverse operation, i.e. bin2hex, should remove the
added 0 bit at the beginning.
@theanomaly ; decbin works just fine since it returns a string composed of 0s
and 1s and not a "binary value". hex2bin / bin2hex are the only function I'm
aware of working this way.
BTW, why did you sent another patch ? mine is doing exactly the same as yours
and is working fine.
@theanomaly as we talked, since Rasmus said it's okey, then I have no objection.
2 suggestions :
1. use oddlen & 1 instead of oddlen % 2
2. since you cal oddlen % 2 twice, so it's better if you can use a var to hold it
and it's better for you to make a pull request by yourself, let me know if you
need help on that :)
If I could intervene, my patch does exactly the same thing without adding a new
test for each iteration of the loop.
Mine has only one modulo (%) outside of the loop and the new test is executed only
once at the first pass assuming the string is a correct hexadecimal value.
Like said in your last comment, the % operation can even be optimized with a '& 1'
I didn't mean to step on any toes, honestly. I think your patch probably looks
way cleaner than mine, but when I tried compiling your patch it did not work for
me. The test didn't pass.
var_dump(bin2hex(hex2bin(1))); // returned string(0) ""
Maybe I didn't do it right, but that's the only reason I submitted another patch
after I tested again on the PHP-5.4 branch.
You are right, there was a bug in the patch I sent. I updated it on github.
Thanks for the comment !
In my eyes hex2bin should not try to fix the corrupt data by padding it with zero. Instead it should throw a warning.
The reason is simple: A padding can be either added on the left or on the right. E.g. "af52b" could be interpreted both as "0af52b" and as "af52b0".
Both are valid interpretations, depending on the use case.
Because of this I think a warning is more appropriate. It signals the user that the data is wrong. He can always easily fix it by adding a zero to either of the sides, whatever is more appropriate in his particular situation.
Blindly adding the padding on the other hand will probably be quite unexpected. Especially when adding the 0 on the left side all the data will be shifted by 4 bits. This means that the whole data will be corrupted. All bytes will change. (Adding a padding on the right is less intrusive as it only changes one byte).
> The reason is simple: A padding can be either added on the left or on the
right. E.g. "af52b" could be interpreted both as "0af52b" and as "af52b0".
Why would af52b ever mean af52b0 ? That's like saying 15 could be interpreted as
We have no problem type juggling a string to an int as the first non-whitespace,
non-zero number character...
var_dump(1 + "\t\r\n 0001"); // int(2)
Then, equally, we should have no problem interpreting a hexadecimal
representation of 1 as 01.
Automatic comment on behalf of nikic
Log: Fix bug #61660: bin2hex(hex2bin($data)) != $data
After some further discussion on IRC we agreed that it is best to throw a warning. The reasoning is as outlined in my previous comment.
The warning is implemented with https://github.com/php/php-src/commit/7ae93a2c4c8a51cc2aec9977ce3c83c100e382a0.
The patch currently in master still leaves a remanent problem. First,
var_dump(hex2bin('')) still returns string(0) "". Whether or not this is
acceptable is arguable. Since 0x00 is null, but an empty string isn't really
var_dump(unpack('H*',hex2bin(''))); // will give me string(0) ""
var_dump(unpack('H*',hex2bin('00'))); // will give me string(2) ""
var_dump(hex2bin('00')); // will give me string(1) "" # which is what I expect
So the warning needs to be amended for an empty string as well as odd-sized hex.
Additionally, the function now returns a bool(false), breaking BC.
I suggest a second optional argument to return partial binary (potentially
broken binary) even on error in the event the user wants to maintain backwards
compatibility. I'll be happy to submit another patch for this if there are no
I do not think BC break in 5.4 is a good idea.
@Stas: Just to be sure we are on the same page:
a) This does not break BC in any significant way. Using the function with odd length was wrong from the start. It definitely breaks less than other changes that landed to all branches, like the handling of invalid multibyte sequences in json_encode.
b) If we don't change it now, it will be hard to do so later. PHP 5.4 isn't used much yet, so people didn't have much chance to misuse this function. This may change in the future.
c) From hour long discussions on IRC it turns out that this is largely a documentation problem. Most people involved in the discussion thought that hex2bin() converts hexadecimal numbers to binary numbers (which is wrong). This has now be clarified in the docs: http://de3.php.net/hex2bin
If you wish so, I will revert the change. But I will spend no more time discussing this. I already spent several hours in discussions about this rather unimportant issue.
This is hardly believable. This bug is so much obvious that it hurts to read the comment "The reason is simple: A padding can be either added on the left or on the right. E.g. "af52b" could be interpreted both as "0af52b" and as "af52b0".". Anybody writing this must have failed in CS101 (hard!) - how is it working that persons who write this still obviously work for this company?
I know it is hard to get something as old as PHP in a functional, future-oriented state, but as long as comments like this from official @php.net accounts are not deleted immediately this will not happen IMHO.
@personal_homepage_tools at hochsitze dot com:
I'm not sure I understand you. What's the issue with that comment? Do you disagree that a padding can be added to either sides?
Just to add something constructive: obviously bin2hex must return a valid hex string, i.e. with a leading zero if the leading byte consumes less than 4 bits.
bin2hex already now returns a valid hex string. This bug report is about the inverse function hex2bin, which converts a hexadecimally encoded binary string back into a binary string. (Please note that we are not talking about NUMBERs here, but about STRINGs.)
So I'm not quite sure what the problem is there.
The change has found it's way into PHP 5.4.4, so closing this bug report.
Automatic comment on behalf of nikic
Log: Fix bug #61660: bin2hex(hex2bin($data)) != $data