php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #38427 unicode causes xml_parser to misbehave
Submitted: 2006-08-11 11:57 UTC Modified: 2006-08-15 22:49 UTC
From: codje2 at lboro dot ac dot uk Assigned:
Status: Closed Package: XML related
PHP Version: 5.1.4 OS: OS X
Private report: No CVE-ID: None
 [2006-08-11 11:57 UTC] codje2 at lboro dot ac dot uk
Description:
------------
When xml_parse_into_struct reaches a unicode character (i.e an 
umlaut) when in a CDATA array element.It will split the text 
across two elements the second starting with that character.

Reproduce code:
---------------
<?php
header('Content-type: text/html; charset=utf-8');

$input=<<<END
<p>s?me p tag <b>s?me text</b> a bit more ?f the p t?g</p>
END;

print "<html><body><pre><xmp>";

//either save this example in UTF-8 form, or enable the follwing line

//$input=utf8_encode($input);

$parser = xml_parser_create('utf-8');
xml_parse_into_struct($parser, $input, $vals, $index);
print_r($vals);

print "</xmp></pre></body></html>";
?>

Expected result:
----------------
Array
(
    [0] => Array
        (
            [tag] => P
            [type] => open
            [level] => 1
            [value] => s?me p tag 
        )

    [1] => Array
        (
            [tag] => B
            [type] => complete
            [level] => 2
            [value] => s?me text
        )

    [2] => Array
        (
            [tag] => P
            [value] =>  a bit more ?f the p t?g
            [type] => cdata
            [level] => 1
        )

    [4] => Array
        (
            [tag] => P
            [type] => close
            [level] => 1
        )

)

Actual result:
--------------
Array
(
    [0] => Array
        (
            [tag] => P
            [type] => open
            [level] => 1
            [value] => s?me p tag 
        )

    [1] => Array
        (
            [tag] => B
            [type] => complete
            [level] => 2
            [value] => s?me text
        )

    [2] => Array
        (
            [tag] => P
            [value] =>  a bit more 
            [type] => cdata
            [level] => 1
        )

    [3] => Array
        (
            [tag] => P
            [value] => ?f the p t?g
            [type] => cdata
            [level] => 1
        )

    [4] => Array
        (
            [tag] => P
            [type] => close
            [level] => 1
        )

)

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-08-15 22:49 UTC] rrichards@php.net
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.


 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Jan 22 10:01:30 2025 UTC