php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #79090 DomDocument in recover mode improperly strips amp after error in XML
Submitted: 2020-01-09 18:01 UTC Modified: 2020-01-16 08:45 UTC
From: danielkarp at gmail dot com Assigned:
Status: Wont fix Package: DOM XML related
PHP Version: 7.4.1 OS: CentOS
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2020-01-09 18:01 UTC] danielkarp at gmail dot com
Description:
------------
Using DomDocument to recover malformed XML (mismatched tags). In an XML file, AFTER mismatched tags are repaired (but not before), & is improperly stripped from the file.
Note, I selected 7.4.1 above, but I've tested this only in 7.4.0. But this is a longstanding bug going back to at least 7.2.

You can also see an example here:
https://code.sololearn.com/wAQh3FwVdYGO

Test script:
---------------
<?php
$d = new DomDocument();
$domDocument = new DOMDocument( '1.0', 'utf-8' );
$domDocument->recover = true;
	  
$xml = <<<EOT
<?xml version="1.0" encoding="utf-8"?>
<xml> Amp: &amp;
<foo> baz: <bar>foobar</foo></bar>
No amp? &amp;
</xml>
EOT;
$domDocument->loadXML( $xml );
echo "<br><br>";
echo htmlspecialchars( $domDocument->saveXml());
?>

Expected result:
----------------
<?xml version="1.0" encoding="utf-8"?> <xml> Amp: &amp; <foo> baz: <bar>foobar</bar></foo> No amp? &amp; </xml>

Actual result:
--------------
<?xml version="1.0" encoding="utf-8"?> <xml> Amp: &amp; <foo> baz: <bar>foobar</bar></foo> No amp? </xml>

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-01-16 08:45 UTC] sjon@php.net
-Status: Open +Status: Wont fix
 [2020-01-16 08:45 UTC] sjon@php.net
I can confirm this bug - but it is not caused by PHP, this is apparently a bug in libxml itself, which you can verify by using `xmllint --recover` on the cli. It might be a duplicate of https://bugzilla.gnome.org/show_bug.cgi?id=750762
 
PHP Copyright © 2001-2021 The PHP Group
All rights reserved.
Last updated: Mon Oct 18 21:03:39 2021 UTC