php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #38901 wddx mangles utf8 characters in serialized strings (broken more than in 4.4)
Submitted: 2006-09-20 13:46 UTC Modified: 2008-07-21 01:00 UTC
From: grzegorz dot nosek at netart dot pl Assigned: andrei (profile)
Status: No Feedback Package: WDDX related
PHP Version: 5.1.6 OS: Linux
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: grzegorz dot nosek at netart dot pl
New email:
PHP Version: OS:

 

 [2006-09-20 13:46 UTC] grzegorz dot nosek at netart dot pl
Description:
------------
wddx gets confused if you try to serialize a string with utf8 characters (like the one below, it contains 'z with dot above', 'o acute', 'l stroke' and 'w' - in case it gets messed up somehow).

serialized string will contain <char code='C5'/><char code='BC'/> ... etc, which will get fed into xml_utf8_decode byte by byte (after decoding the hex value), totally wrecking the output.

Up to this point, it's a duplicate of #38900.

However, PHP5 has another bug with variable names (e.g. hash keys) containing UTF8 characters. It seems that the var name is converted down from UTF8 to ISO-8859-1, yielding question marks instead of characters outside latin1.

Another hackish patch (whitespace-mutilated):

--- a/ext/wddx/wddx.c
+++ b/ext/wddx/wddx.c
@@ -814,10 +814,7 @@ static void php_wddx_push_element(void *
if (atts) for (i = 0; atts[i]; i++) {
if (!strcmp(atts[i], EL_NAME) && atts[++i] && atts[i][0]) {
- char *decoded;
- int decoded_len;
- decoded = xml_utf8_decode(atts[i], strlen(atts[i]), &decoded_len, "ISO-8859-1");
- stack->varname = decoded;
+ stack->varname = estrndup(atts[i], strlen(atts[i]));
break;
}
}
@@ -1057,7 +1054,12 @@ static void php_wddx_process_data(void *
wddx_stack_top(stack, (void**)&ent);
switch (Z_TYPE_P(ent)) {
case ST_STRING:
- decoded = xml_utf8_decode(s, len, &decoded_len, "ISO-8859-1");
+ if (len > 1) {
+ decoded = xml_utf8_decode(s, len, &decoded_len, "ISO-8859-1");
+ } else {
+ decoded = estrndup(s, len);
+ decoded_len = len;
+ }

Reproduce code:
---------------
See http://bugs.php.net/bug.php?id=38900


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-07-21 01:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Dec 22 06:01:30 2024 UTC