go to bug id or search bugs for
If binary data is passed to createTextNode() it can cause invalid XML to be generated.
$document = new DOMDocument('1.0', 'UTF-8');
$document->formatOutput = true;
$root = $document->createElement('example');
$example = $document->createTextNode("PK");
<?xml version="1.0" encoding="UTF-8"?>
Add a Patch
Add a Pull Request
Looks like your bug report system strips out the binary. The additional characters were ETX, EOT and DC4
I don't think this is a bug. Though DOM 3 doesn't say much, from what I can gather createTextNode should not try to sanitize the text input. And though &"'<> will be escaped during serialization, according to context, that's all and it's up to the developer to not do things like use control characters.
And the bug system didn't strip them. The characters are there, your browser is just not rendering them as anything.
It is a tricky one for me. The input to the method is arbitrary as it is externally generated. If everyone who uses the createTextNode() method is forced to sanitize the input, it kills some of the benefit of createTextNode(), which is that it sanitizes most things, just not low ascii values. Everyone would have to add code to perform the sanitization. In addition - in this case - the call is in a third party library (PHPUnit to be precise). I have submitted a bug report to them too, however there may be many third-party libraries that also use this method. At a minimum the documentation should point out this problem and maybe point the reader at createCDATASection (though I haven't tried that, so I don't know if it handles this). But given that it already performs sanitization on most of the input, it is arguable that it should handle this case too.
The only thing a developer needs to account for is that the text contains characters/byte sequences that are valid for the target XML document. That means no \x00-\x1F (besides whitespace) and that it uses the correct character encoding. I don't think that's an undue burden.
Meanwhile & ' " < > are handled appropriately according to when the text node is serialized, which is something explicitly mentioned in DOM 2/3.
However createTextNode's documentation could use a little more explanation about what's going on as apparently there's been confusion about whether/how text is escaped. For example, as I mentioned earlier escaping does happen, and it does so at serialization rather than when the text node is created. https://3v4l.org/DQh17
Though not handling <= \x1F is unfortunate, I think it would be best to match existing widespread implementations.