php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #70962 XML_OPTION_SKIP_WHITE strips embedded whitespace
Submitted: 2015-11-24 01:25 UTC Modified: 2021-09-16 09:57 UTC
Votes:2
Avg. Score:3.0 ± 0.0
Reproduced:0 of 0 (0.0%)
From: dwilks at intacct dot com Assigned: cmb (profile)
Status: Closed Package: *XML functions
PHP Version: 7.0.0RC7 OS: OSX 10.9.4
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: dwilks at intacct dot com
New email:
PHP Version: OS:

 

 [2015-11-24 01:25 UTC] dwilks at intacct dot com
Description:
------------
XML_OPTION_SKIP_WHITE should skip elements if the entire element is whitespace.  Unfortunately it looks like expat calls the character handler on individual xml entities within a string so any whitespace between xml entities is ignored.

Looks like the hhvm guys noticed and "fixed this"

Test script:
---------------
@ https://3v4l.org/4TD9v also

<?php

function handleCharacterData($parser, $data)
{
    echo __FUNCTION__ . " - " . $data . "\n";
}

function handleElementStart($parser, $name, $attributes)
{
    echo __FUNCTION__ . " - " . $name . "\n";
}

function handleElementEnd($parser, $name)
{
    echo __FUNCTION__ . " - " . $name . "\n";
}

function parseAndOutput($s)
{
    $p = xml_parser_create();
    xml_parser_set_option($p, XML_OPTION_SKIP_WHITE, 1);
    xml_set_character_data_handler($p, 'handleCharacterData');
    xml_set_element_handler($p, 'handleElementStart', 'handleElementEnd');

    xml_parse_into_struct($p, $s, $values);
    echo $values[0]['value'] . "\n\n";
}

$s = "<a>b\nc</a>";
parseAndOutput($s);

$s = "<a>&lt;b&gt;\n&lt;c&gt;</a>";
parseAndOutput($s);

Expected result:
----------------
handleElementStart - A
handleCharacterData - b
c
handleElementEnd - A
b
c

handleElementStart - A
handleCharacterData - <b>
<c>
handleElementEnd - A
<b>
<c>


Actual result:
--------------
handleElementStart - A
handleCharacterData - b
c
handleElementEnd - A
b
c

handleElementStart - A
handleCharacterData - <
handleCharacterData - b
handleCharacterData - >
handleCharacterData - 

handleCharacterData - <
handleCharacterData - c
handleCharacterData - >
handleElementEnd - A
<b><c>


Patches

Pull Requests

Pull requests:

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-10-08 15:16 UTC] cmb@php.net
-Status: Open +Status: Verified -Assigned To: +Assigned To: cmb
 [2018-10-08 15:16 UTC] cmb@php.net
Although, the documentation on XML_OPTION_SKIP_WHITE is not
particularly clear, since it doesn't mention that this option is
only relevant for xml_parse_into_struct(), and only there “value”
has a clearly defined meaning, this is clearly a bug.  It is
particularly strange, that this whitespace removal depends on the
kind of whitespace, if ext/xml is built against libexpat, in which
case the first parseAndOutput() removes the LF, but would not
remove a space.
 [2018-10-08 17:47 UTC] cmb@php.net
-Package: *XML functions +Package: XML related
 [2019-11-04 11:59 UTC] cmb@php.net
-Assigned To: cmb +Assigned To:
 [2019-11-04 11:59 UTC] cmb@php.net
Unassigning myself for time reasons. :(
 [2021-09-15 12:38 UTC] cmb@php.net
The following pull request has been associated:

Patch Name: Fix #70962: xml_parse_into_struct strips embedded whitespace with XML…
On GitHub:  https://github.com/php/php-src/pull/7493
Patch:      https://github.com/php/php-src/pull/7493.patch
 [2021-09-15 12:45 UTC] cmb@php.net
-Package: XML related +Package: *XML functions -Assigned To: +Assigned To: cmb
 [2021-09-15 12:45 UTC] cmb@php.net
> XML_OPTION_SKIP_WHITE should skip elements if the entire element
> is whitespace.

This is my understanding of the docs[1] as well, but that would be
too much of a BC break.  We should leave that as is (and improve
the docs), and of course fix the removal of whitespace inside node
values.

[1] <https://www.php.net/manual/en/function.xml-parser-set-option.php>
 [2021-09-16 09:57 UTC] cmb@php.net
-Summary: xml_parse_into_struct strips embedded whitespace with XML_OPTION_SKIP_WHITE +Summary: XML_OPTION_SKIP_WHITE strips embedded whitespace
 [2021-09-16 10:46 UTC] git@php.net
Automatic comment on behalf of Flashwade1990 (author) and cmb69 (committer)
Revision: https://github.com/php/php-src/commit/a9661a5293e98e8c7255663c987297c14a6285ec
Log: Fix #70962: XML_OPTION_SKIP_WHITE strips embedded whitespace
 [2021-09-16 10:46 UTC] git@php.net
-Status: Verified +Status: Closed
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 17:01:58 2024 UTC