php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #38900 wddx mangles utf8 characters in serialized strings
Submitted: 2006-09-20 13:40 UTC Modified: 2008-07-24 01:00 UTC
Votes:3
Avg. Score:5.0 ± 0.0
Reproduced:3 of 3 (100.0%)
Same Version:1 (33.3%)
Same OS:2 (66.7%)
From: grzegorz dot nosek at netart dot pl Assigned: andrei (profile)
Status: No Feedback Package: WDDX related
PHP Version: 4.4.4 OS: Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: grzegorz dot nosek at netart dot pl
New email:
PHP Version: OS:

 

 [2006-09-20 13:40 UTC] grzegorz dot nosek at netart dot pl
Description:
------------
wddx gets confused if you try to serialize a string with utf8 characters (like the one below, it contains 'z with dot above', 'o acute', 'l stroke' and 'w' - in case it gets messed up somehow).

serialized string will contain <char code='C5'/><char code='BC'/> ... etc, which will get fed into xml_utf8_decode byte by byte (after decoding the hex value), totally wrecking the output.

Hackish patch which fixes this (whitespace mangled mercilessly, adjust to taste):

--- a/ext/wddx/wddx.c
+++ b/ext/wddx/wddx.c
@@ -1038,8 +1038,13 @@ static void php_wddx_process_data(void *
if (!wddx_stack_is_empty(stack) && !stack->done) {
wddx_stack_top(stack, (void**)&ent);
switch (Z_TYPE_P(ent)) {
- case ST_STRING:
- decoded = xml_utf8_decode(s, len, &decoded_len, "ISO-8859-1");
+ case ST_STRING:
+ if (len > 1) {
+ decoded = xml_utf8_decode(s, len, &decoded_len, "ISO-8859-1");
+ } else {
+ decoded = estrndup(s, len);
+ decoded_len = len;
+ }
if (Z_STRLEN_P(ent->data) == 0) {
Z_STRVAL_P(ent->data) = estrndup(decoded, decoded_len);


Reproduce code:
---------------
--TEST--
wddx_deserialize mangles utf8 characters
--SKIPIF--
<?php if (!extension_loaded("wddx")) print "skip"; ?>
--FILE--
<?php
$zolw = iconv("ISO-8859-2", "UTF-8", "&#380;?&#322;w");
$in = array ( $zolw => $zolw );
var_dump(array_diff($in, wddx_deserialize(wddx_serialize_value($in))));
?>
--EXPECT--
array(0) {
}



Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-07-11 13:08 UTC] jani@php.net
See also bug #40080
 [2007-07-16 08:40 UTC] grzegorz dot nosek at netart dot pl
Yes, it's apparently a duplicate, or at least related. The symptoms are certainly similar.
 [2008-07-24 01:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Nov 22 02:01:28 2024 UTC