php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #55531 Vertical tabs ignored by XMLWriter
Submitted: 2011-08-29 15:52 UTC Modified: 2021-06-28 16:58 UTC
Votes:6
Avg. Score:5.0 ± 0.0
Reproduced:4 of 4 (100.0%)
Same Version:2 (50.0%)
Same OS:4 (100.0%)
From: brianlmoon@php.net Assigned: cmb (profile)
Status: Not a bug Package: XML Writer
PHP Version: 5.3.8 OS: Linux
Private report: No CVE-ID: None
 [2011-08-29 15:52 UTC] brianlmoon@php.net
Description:
------------
When text contains vertical tabs, XMLWriter silently ignores them and generates invalid XML. This is not an issue where the text is invalid UTF-8. It is valid UTF-8 data. Vertical tabs are simply not allowed in XML by rule. I would expect XMLWriter to encode it as it would any other character not allowed in XML. I suspect that 

Test script:
---------------
<?php

    $xml = new XmlWriter();
    $xml->openMemory();
    $xml->startDocument('1.0', 'UTF-8');
    $xml->writeElement("test", "This data contains a \vvertical tab");
    $xml->endElement();
    $data = $xml->outputMemory(true);

    $sxml = simplexml_load_string($data);

?>

Expected result:
----------------
Either an error or valid XML.

Actual result:
--------------
Invalid XML is silently created. For example, SimpleXML::addchild() throws a warning: SimpleXMLElement::asXML(): xmlEscapeEntities : char out of range when a vertical tab is present and the node is not added.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2011-08-29 16:01 UTC] pajoye@php.net
XML 1.0 this character is not valid. Not sure if we should allow it, .net 2.x 
allows them optionally using settings.CheckCharacters = true;.

But that|s something libxml deals with, PHP's xml extensions only rely on it to 
parse input or generate data.
 [2016-10-17 13:36 UTC] cmb@php.net
-Package: XML related +Package: XML Writer
 [2021-04-01 14:42 UTC] cmb@php.net
While \v is indeed invalid for XML 1.0, it is valid for XML 1.1.
We could entity-encode such characters for 1.0 documents, and that
likely wouldn't cause a BC break, but we could also leave that to
libxml2 (to my knowledge, handling of invalid characters is not
even documented by them).  But if we manually entity-encode, we
should make sure to keep that consistent (at least for XMLWriter,
but likely for all bundled XML extension as well).  Not sure.

And we should decide how to handle NUL bytes.  Currently, the
strings are truncated, what doesn't appear to be intentional.
 [2021-06-28 16:58 UTC] cmb@php.net
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2021-06-28 16:58 UTC] cmb@php.net
This issue is fixed as of libxml2 2.9.11:

    Warning: simplexml_load_string(): Entity: line 2: parser error : PCDATA invalid Char value 11

    Warning: simplexml_load_string(): <test>This data contains a vertical tab</test>

    Warning: simplexml_load_string():                            ^
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Oct 31 23:01:28 2024 UTC