|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #38900 wddx mangles utf8 characters in serialized strings
Submitted: 2006-09-20 13:40 UTC Modified: 2008-07-24 01:00 UTC
Avg. Score:5.0 ± 0.0
Reproduced:3 of 3 (100.0%)
Same Version:1 (33.3%)
Same OS:2 (66.7%)
From: grzegorz dot nosek at netart dot pl Assigned: andrei (profile)
Status: No Feedback Package: WDDX related
PHP Version: 4.4.4 OS: Linux
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please — but make sure to vote on the bug!
Your email address:
Solve the problem:
11 + 8 = ?
Subscribe to this entry?

 [2006-09-20 13:40 UTC] grzegorz dot nosek at netart dot pl
wddx gets confused if you try to serialize a string with utf8 characters (like the one below, it contains 'z with dot above', 'o acute', 'l stroke' and 'w' - in case it gets messed up somehow).

serialized string will contain <char code='C5'/><char code='BC'/> ... etc, which will get fed into xml_utf8_decode byte by byte (after decoding the hex value), totally wrecking the output.

Hackish patch which fixes this (whitespace mangled mercilessly, adjust to taste):

--- a/ext/wddx/wddx.c
+++ b/ext/wddx/wddx.c
@@ -1038,8 +1038,13 @@ static void php_wddx_process_data(void *
if (!wddx_stack_is_empty(stack) && !stack->done) {
wddx_stack_top(stack, (void**)&ent);
switch (Z_TYPE_P(ent)) {
- case ST_STRING:
- decoded = xml_utf8_decode(s, len, &decoded_len, "ISO-8859-1");
+ case ST_STRING:
+ if (len > 1) {
+ decoded = xml_utf8_decode(s, len, &decoded_len, "ISO-8859-1");
+ } else {
+ decoded = estrndup(s, len);
+ decoded_len = len;
+ }
if (Z_STRLEN_P(ent->data) == 0) {
Z_STRVAL_P(ent->data) = estrndup(decoded, decoded_len);

Reproduce code:
wddx_deserialize mangles utf8 characters
<?php if (!extension_loaded("wddx")) print "skip"; ?>
$zolw = iconv("ISO-8859-2", "UTF-8", "&#380;?&#322;w");
$in = array ( $zolw => $zolw );
var_dump(array_diff($in, wddx_deserialize(wddx_serialize_value($in))));
array(0) {


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2007-07-11 13:08 UTC]
See also bug #40080
 [2007-07-16 08:40 UTC] grzegorz dot nosek at netart dot pl
Yes, it's apparently a duplicate, or at least related. The symptoms are certainly similar.
 [2008-07-24 01:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Mon May 20 15:01:36 2024 UTC