php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #70962 xml_parse_into_struct strips embedded whitespace with XML_OPTION_SKIP_WHITE
Submitted: 2015-11-24 01:25 UTC Modified: 2019-11-04 11:59 UTC
Votes:1
Avg. Score:3.0 ± 0.0
Reproduced:0 of 0 (0.0%)
From: dwilks at intacct dot com Assigned:
Status: Verified Package: XML related
PHP Version: 7.0.0RC7 OS: OSX 10.9.4
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2015-11-24 01:25 UTC] dwilks at intacct dot com
Description:
------------
XML_OPTION_SKIP_WHITE should skip elements if the entire element is whitespace.  Unfortunately it looks like expat calls the character handler on individual xml entities within a string so any whitespace between xml entities is ignored.

Looks like the hhvm guys noticed and "fixed this"

Test script:
---------------
@ https://3v4l.org/4TD9v also

<?php

function handleCharacterData($parser, $data)
{
    echo __FUNCTION__ . " - " . $data . "\n";
}

function handleElementStart($parser, $name, $attributes)
{
    echo __FUNCTION__ . " - " . $name . "\n";
}

function handleElementEnd($parser, $name)
{
    echo __FUNCTION__ . " - " . $name . "\n";
}

function parseAndOutput($s)
{
    $p = xml_parser_create();
    xml_parser_set_option($p, XML_OPTION_SKIP_WHITE, 1);
    xml_set_character_data_handler($p, 'handleCharacterData');
    xml_set_element_handler($p, 'handleElementStart', 'handleElementEnd');

    xml_parse_into_struct($p, $s, $values);
    echo $values[0]['value'] . "\n\n";
}

$s = "<a>b\nc</a>";
parseAndOutput($s);

$s = "<a>&lt;b&gt;\n&lt;c&gt;</a>";
parseAndOutput($s);

Expected result:
----------------
handleElementStart - A
handleCharacterData - b
c
handleElementEnd - A
b
c

handleElementStart - A
handleCharacterData - <b>
<c>
handleElementEnd - A
<b>
<c>


Actual result:
--------------
handleElementStart - A
handleCharacterData - b
c
handleElementEnd - A
b
c

handleElementStart - A
handleCharacterData - <
handleCharacterData - b
handleCharacterData - >
handleCharacterData - 

handleCharacterData - <
handleCharacterData - c
handleCharacterData - >
handleElementEnd - A
<b><c>


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-10-08 15:16 UTC] cmb@php.net
-Status: Open +Status: Verified -Assigned To: +Assigned To: cmb
 [2018-10-08 15:16 UTC] cmb@php.net
Although, the documentation on XML_OPTION_SKIP_WHITE is not
particularly clear, since it doesn't mention that this option is
only relevant for xml_parse_into_struct(), and only there “value”
has a clearly defined meaning, this is clearly a bug.  It is
particularly strange, that this whitespace removal depends on the
kind of whitespace, if ext/xml is built against libexpat, in which
case the first parseAndOutput() removes the LF, but would not
remove a space.
 [2018-10-08 17:47 UTC] cmb@php.net
-Package: *XML functions +Package: XML related
 [2019-11-04 11:59 UTC] cmb@php.net
-Assigned To: cmb +Assigned To:
 [2019-11-04 11:59 UTC] cmb@php.net
Unassigning myself for time reasons. :(
 
PHP Copyright © 2001-2019 The PHP Group
All rights reserved.
Last updated: Thu Dec 12 17:01:24 2019 UTC