php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #79090 DomDocument in recover mode improperly strips amp after error in XML
Submitted: 2020-01-09 18:01 UTC Modified: 2020-01-16 08:45 UTC
From: danielkarp at gmail dot com Assigned:
Status: Wont fix Package: DOM XML related
PHP Version: 7.4.1 OS: CentOS
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: danielkarp at gmail dot com
New email:
PHP Version: OS:

 

 [2020-01-09 18:01 UTC] danielkarp at gmail dot com
Description:
------------
Using DomDocument to recover malformed XML (mismatched tags). In an XML file, AFTER mismatched tags are repaired (but not before), & is improperly stripped from the file.
Note, I selected 7.4.1 above, but I've tested this only in 7.4.0. But this is a longstanding bug going back to at least 7.2.

You can also see an example here:
https://code.sololearn.com/wAQh3FwVdYGO

Test script:
---------------
<?php
$d = new DomDocument();
$domDocument = new DOMDocument( '1.0', 'utf-8' );
$domDocument->recover = true;
	  
$xml = <<<EOT
<?xml version="1.0" encoding="utf-8"?>
<xml> Amp: &amp;
<foo> baz: <bar>foobar</foo></bar>
No amp? &amp;
</xml>
EOT;
$domDocument->loadXML( $xml );
echo "<br><br>";
echo htmlspecialchars( $domDocument->saveXml());
?>

Expected result:
----------------
<?xml version="1.0" encoding="utf-8"?> <xml> Amp: &amp; <foo> baz: <bar>foobar</bar></foo> No amp? &amp; </xml>

Actual result:
--------------
<?xml version="1.0" encoding="utf-8"?> <xml> Amp: &amp; <foo> baz: <bar>foobar</bar></foo> No amp? </xml>

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-01-16 08:45 UTC] sjon@php.net
-Status: Open +Status: Wont fix
 [2020-01-16 08:45 UTC] sjon@php.net
I can confirm this bug - but it is not caused by PHP, this is apparently a bug in libxml itself, which you can verify by using `xmllint --recover` on the cli. It might be a duplicate of https://bugzilla.gnome.org/show_bug.cgi?id=750762
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Mar 12 04:01:30 2025 UTC