php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #38900 wddx mangles utf8 characters in serialized strings
Submitted: 2006-09-20 13:40 UTC Modified: 2008-07-24 01:00 UTC
Votes:3
Avg. Score:5.0 ± 0.0
Reproduced:3 of 3 (100.0%)
Same Version:1 (33.3%)
Same OS:2 (66.7%)
From: grzegorz dot nosek at netart dot pl Assigned: andrei (profile)
Status: No Feedback Package: WDDX related
PHP Version: 4.4.4 OS: Linux
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: grzegorz dot nosek at netart dot pl
New email:
PHP Version: OS:

 

 [2006-09-20 13:40 UTC] grzegorz dot nosek at netart dot pl
Description:
------------
wddx gets confused if you try to serialize a string with utf8 characters (like the one below, it contains 'z with dot above', 'o acute', 'l stroke' and 'w' - in case it gets messed up somehow).

serialized string will contain <char code='C5'/><char code='BC'/> ... etc, which will get fed into xml_utf8_decode byte by byte (after decoding the hex value), totally wrecking the output.

Hackish patch which fixes this (whitespace mangled mercilessly, adjust to taste):

--- a/ext/wddx/wddx.c
+++ b/ext/wddx/wddx.c
@@ -1038,8 +1038,13 @@ static void php_wddx_process_data(void *
if (!wddx_stack_is_empty(stack) && !stack->done) {
wddx_stack_top(stack, (void**)&ent);
switch (Z_TYPE_P(ent)) {
- case ST_STRING:
- decoded = xml_utf8_decode(s, len, &decoded_len, "ISO-8859-1");
+ case ST_STRING:
+ if (len > 1) {
+ decoded = xml_utf8_decode(s, len, &decoded_len, "ISO-8859-1");
+ } else {
+ decoded = estrndup(s, len);
+ decoded_len = len;
+ }
if (Z_STRLEN_P(ent->data) == 0) {
Z_STRVAL_P(ent->data) = estrndup(decoded, decoded_len);


Reproduce code:
---------------
--TEST--
wddx_deserialize mangles utf8 characters
--SKIPIF--
<?php if (!extension_loaded("wddx")) print "skip"; ?>
--FILE--
<?php
$zolw = iconv("ISO-8859-2", "UTF-8", "&#380;?&#322;w");
$in = array ( $zolw => $zolw );
var_dump(array_diff($in, wddx_deserialize(wddx_serialize_value($in))));
?>
--EXPECT--
array(0) {
}



Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-07-11 13:08 UTC] jani@php.net
See also bug #40080
 [2007-07-16 08:40 UTC] grzegorz dot nosek at netart dot pl
Yes, it's apparently a duplicate, or at least related. The symptoms are certainly similar.
 [2008-07-24 01:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Mon Nov 25 01:01:31 2024 UTC