|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #55531 Vertical tabs ignored by XMLWriter
Submitted: 2011-08-29 15:52 UTC Modified: 2021-06-28 16:58 UTC
Avg. Score:5.0 ± 0.0
Reproduced:4 of 4 (100.0%)
Same Version:2 (50.0%)
Same OS:4 (100.0%)
From: Assigned: cmb (profile)
Status: Not a bug Package: XML Writer
PHP Version: 5.3.8 OS: Linux
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
Block user comment
Status: Assign to:
Bug Type:
New email:
PHP Version: OS:


 [2011-08-29 15:52 UTC]
When text contains vertical tabs, XMLWriter silently ignores them and generates invalid XML. This is not an issue where the text is invalid UTF-8. It is valid UTF-8 data. Vertical tabs are simply not allowed in XML by rule. I would expect XMLWriter to encode it as it would any other character not allowed in XML. I suspect that 

Test script:

    $xml = new XmlWriter();
    $xml->startDocument('1.0', 'UTF-8');
    $xml->writeElement("test", "This data contains a \vvertical tab");
    $data = $xml->outputMemory(true);

    $sxml = simplexml_load_string($data);


Expected result:
Either an error or valid XML.

Actual result:
Invalid XML is silently created. For example, SimpleXML::addchild() throws a warning: SimpleXMLElement::asXML(): xmlEscapeEntities : char out of range when a vertical tab is present and the node is not added.


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2011-08-29 16:01 UTC]
XML 1.0 this character is not valid. Not sure if we should allow it, .net 2.x 
allows them optionally using settings.CheckCharacters = true;.

But that|s something libxml deals with, PHP's xml extensions only rely on it to 
parse input or generate data.
 [2016-10-17 13:36 UTC]
-Package: XML related +Package: XML Writer
 [2021-04-01 14:42 UTC]
While \v is indeed invalid for XML 1.0, it is valid for XML 1.1.
We could entity-encode such characters for 1.0 documents, and that
likely wouldn't cause a BC break, but we could also leave that to
libxml2 (to my knowledge, handling of invalid characters is not
even documented by them).  But if we manually entity-encode, we
should make sure to keep that consistent (at least for XMLWriter,
but likely for all bundled XML extension as well).  Not sure.

And we should decide how to handle NUL bytes.  Currently, the
strings are truncated, what doesn't appear to be intentional.
 [2021-06-28 16:58 UTC]
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2021-06-28 16:58 UTC]
This issue is fixed as of libxml2 2.9.11:

    Warning: simplexml_load_string(): Entity: line 2: parser error : PCDATA invalid Char value 11

    Warning: simplexml_load_string(): <test>This data contains a vertical tab</test>

    Warning: simplexml_load_string():                            ^
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Mon Jun 17 09:01:30 2024 UTC