|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2006-05-27 09:57 UTC] jdolecek at NetBSD dot org
Description:
------------
The condition which determines if a character in string should be encoded using the <char code="XX"/> construct was changed in php-src/ext/wddx/wddx.c was changed in rev. 1.135 to:
if (iscntrl((int)*(unsigned char *)p) || (int)*(unsigned char *)p >= 127) {
...encode using <char code="XX"/>...
}
This means that _all_ non-ascii characters are encoded with the construct, which explodes the result packet size if non-ascii characters are used.
The "|| (int)*(unsigned char *)p >= 127" parts seems as left-over debug code and should be removed.
Reproduce code:
---------------
// this was not actually tried, this is just code review
wddx_serialize_value(char(200));
Expected result:
----------------
<wddxPacket version='1.0'><header/><data><string>Č</string></data></wddxPacket>
Actual result:
--------------
<wddxPacket version='1.0'><header/><data><string><char code="C8"/></string></data></wddxPacket>
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Fri Oct 24 07:00:01 2025 UTC |
Yes it is a bug. 1) it breaks current code using UTF-8 and expecting to get iso-8859-1 result from wddx_deserialize(), i.e. $str = chr(200); $str_u8 = utf8_encode($str); $result = wddx_deserialize(wddx_Serialize_value($str_u8)); When run with PHP 5.1.4 or when the data has been serialized with the older version, $result == $str. New version has $result == $str_u8. So, _all_ old serialized UTF-8 data (i.e. stored in database) serializes to different encoding then newly serialized data. This is major backward incompatibility, and is problem for any current applications using serializing of UTF-8 input. (Arguably serializing UTF-8 strings wasn't really very usable before due to Bug #37571, but you get the idea) 2) it explodes the size of packet, and it's not clear what was the reason for the change. This is serious problem when storing the result serialized data, and totally unnecessary. XML is designed 8-bit clean, so encoding high-bit characters this way doesn't make sense. Please explain why encoding characters >= 127 is right. Please revert this part of the patch. If you want to fix wddx so that the encoding on input is same as encoding on output it's fine, but it must be done in backward-compatible way, such as adding some extra parameters to either wddx_serialize_value() or wddx_deserialize().After the PHP upgrade from 5.0 to 5.1, the hungarian letters with accents in our javascript framework (using js library from www.wddx.org) our Hungarian letters with accents disappeared. We have to solve it quickly, so here is our patch to fix this incompatibility. $foo = wddx_serialize_value($bar); $foo = preg_replace("/(<char code='(..)'\/>)/e", "('\\2'<'80' ? '\\1' : chr(hexdec('\\2')))", $foo);