php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #70962 XML_OPTION_SKIP_WHITE strips embedded whitespace
Submitted: 2015-11-24 01:25 UTC Modified: 2021-09-16 09:57 UTC
Votes:2
Avg. Score:3.0 ± 0.0
Reproduced:0 of 0 (0.0%)
From: dwilks at intacct dot com Assigned: cmb (profile)
Status: Closed Package: *XML functions
PHP Version: 7.0.0RC7 OS: OSX 10.9.4
Private report: No CVE-ID: None
 [2015-11-24 01:25 UTC] dwilks at intacct dot com
Description:
------------
XML_OPTION_SKIP_WHITE should skip elements if the entire element is whitespace.  Unfortunately it looks like expat calls the character handler on individual xml entities within a string so any whitespace between xml entities is ignored.

Looks like the hhvm guys noticed and "fixed this"

Test script:
---------------
@ https://3v4l.org/4TD9v also

<?php

function handleCharacterData($parser, $data)
{
    echo __FUNCTION__ . " - " . $data . "\n";
}

function handleElementStart($parser, $name, $attributes)
{
    echo __FUNCTION__ . " - " . $name . "\n";
}

function handleElementEnd($parser, $name)
{
    echo __FUNCTION__ . " - " . $name . "\n";
}

function parseAndOutput($s)
{
    $p = xml_parser_create();
    xml_parser_set_option($p, XML_OPTION_SKIP_WHITE, 1);
    xml_set_character_data_handler($p, 'handleCharacterData');
    xml_set_element_handler($p, 'handleElementStart', 'handleElementEnd');

    xml_parse_into_struct($p, $s, $values);
    echo $values[0]['value'] . "\n\n";
}

$s = "<a>b\nc</a>";
parseAndOutput($s);

$s = "<a>&lt;b&gt;\n&lt;c&gt;</a>";
parseAndOutput($s);

Expected result:
----------------
handleElementStart - A
handleCharacterData - b
c
handleElementEnd - A
b
c

handleElementStart - A
handleCharacterData - <b>
<c>
handleElementEnd - A
<b>
<c>


Actual result:
--------------
handleElementStart - A
handleCharacterData - b
c
handleElementEnd - A
b
c

handleElementStart - A
handleCharacterData - <
handleCharacterData - b
handleCharacterData - >
handleCharacterData - 

handleCharacterData - <
handleCharacterData - c
handleCharacterData - >
handleElementEnd - A
<b><c>


Patches

Add a Patch

Pull Requests

Pull requests:

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-10-08 15:16 UTC] cmb@php.net
-Status: Open +Status: Verified -Assigned To: +Assigned To: cmb
 [2018-10-08 15:16 UTC] cmb@php.net
Although, the documentation on XML_OPTION_SKIP_WHITE is not
particularly clear, since it doesn't mention that this option is
only relevant for xml_parse_into_struct(), and only there “value”
has a clearly defined meaning, this is clearly a bug.  It is
particularly strange, that this whitespace removal depends on the
kind of whitespace, if ext/xml is built against libexpat, in which
case the first parseAndOutput() removes the LF, but would not
remove a space.
 [2018-10-08 17:47 UTC] cmb@php.net
-Package: *XML functions +Package: XML related
 [2019-11-04 11:59 UTC] cmb@php.net
-Assigned To: cmb +Assigned To:
 [2019-11-04 11:59 UTC] cmb@php.net
Unassigning myself for time reasons. :(
 [2021-09-15 12:38 UTC] cmb@php.net
The following pull request has been associated:

Patch Name: Fix #70962: xml_parse_into_struct strips embedded whitespace with XML…
On GitHub:  https://github.com/php/php-src/pull/7493
Patch:      https://github.com/php/php-src/pull/7493.patch
 [2021-09-15 12:45 UTC] cmb@php.net
-Package: XML related +Package: *XML functions -Assigned To: +Assigned To: cmb
 [2021-09-15 12:45 UTC] cmb@php.net
> XML_OPTION_SKIP_WHITE should skip elements if the entire element
> is whitespace.

This is my understanding of the docs[1] as well, but that would be
too much of a BC break.  We should leave that as is (and improve
the docs), and of course fix the removal of whitespace inside node
values.

[1] <https://www.php.net/manual/en/function.xml-parser-set-option.php>
 [2021-09-16 09:57 UTC] cmb@php.net
-Summary: xml_parse_into_struct strips embedded whitespace with XML_OPTION_SKIP_WHITE +Summary: XML_OPTION_SKIP_WHITE strips embedded whitespace
 [2021-09-16 10:46 UTC] git@php.net
Automatic comment on behalf of Flashwade1990 (author) and cmb69 (committer)
Revision: https://github.com/php/php-src/commit/a9661a5293e98e8c7255663c987297c14a6285ec
Log: Fix #70962: XML_OPTION_SKIP_WHITE strips embedded whitespace
 [2021-09-16 10:46 UTC] git@php.net
-Status: Verified +Status: Closed
 
PHP Copyright © 2001-2021 The PHP Group
All rights reserved.
Last updated: Thu Dec 09 07:03:34 2021 UTC