php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #38901 wddx mangles utf8 characters in serialized strings (broken more than in 4.4)
Submitted: 2006-09-20 13:46 UTC Modified: 2008-07-21 01:00 UTC
From: grzegorz dot nosek at netart dot pl Assigned: andrei (profile)
Status: No Feedback Package: WDDX related
PHP Version: 5.1.6 OS: Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: grzegorz dot nosek at netart dot pl
New email:
PHP Version: OS:

 

 [2006-09-20 13:46 UTC] grzegorz dot nosek at netart dot pl
Description:
------------
wddx gets confused if you try to serialize a string with utf8 characters (like the one below, it contains 'z with dot above', 'o acute', 'l stroke' and 'w' - in case it gets messed up somehow).

serialized string will contain <char code='C5'/><char code='BC'/> ... etc, which will get fed into xml_utf8_decode byte by byte (after decoding the hex value), totally wrecking the output.

Up to this point, it's a duplicate of #38900.

However, PHP5 has another bug with variable names (e.g. hash keys) containing UTF8 characters. It seems that the var name is converted down from UTF8 to ISO-8859-1, yielding question marks instead of characters outside latin1.

Another hackish patch (whitespace-mutilated):

--- a/ext/wddx/wddx.c
+++ b/ext/wddx/wddx.c
@@ -814,10 +814,7 @@ static void php_wddx_push_element(void *
if (atts) for (i = 0; atts[i]; i++) {
if (!strcmp(atts[i], EL_NAME) && atts[++i] && atts[i][0]) {
- char *decoded;
- int decoded_len;
- decoded = xml_utf8_decode(atts[i], strlen(atts[i]), &decoded_len, "ISO-8859-1");
- stack->varname = decoded;
+ stack->varname = estrndup(atts[i], strlen(atts[i]));
break;
}
}
@@ -1057,7 +1054,12 @@ static void php_wddx_process_data(void *
wddx_stack_top(stack, (void**)&ent);
switch (Z_TYPE_P(ent)) {
case ST_STRING:
- decoded = xml_utf8_decode(s, len, &decoded_len, "ISO-8859-1");
+ if (len > 1) {
+ decoded = xml_utf8_decode(s, len, &decoded_len, "ISO-8859-1");
+ } else {
+ decoded = estrndup(s, len);
+ decoded_len = len;
+ }

Reproduce code:
---------------
See http://bugs.php.net/bug.php?id=38900


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-07-21 01:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Dec 22 05:01:30 2024 UTC