|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #67727 Inconsistent loading and saving of DOMDocument
Submitted: 2014-07-31 14:35 UTC Modified: -
Avg. Score:3.8 ± 1.0
Reproduced:11 of 12 (91.7%)
Same Version:4 (36.4%)
Same OS:0 (0.0%)
From: villascape at gmail dot com Assigned:
Status: Open Package: DOM XML related
PHP Version: 5.5.15 OS: Centos 6.5
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
Block user comment
Status: Assign to:
Bug Type:
From: villascape at gmail dot com
New email:
PHP Version: OS:


 [2014-07-31 14:35 UTC] villascape at gmail dot com
DOMDocument::loadHTML() and DOMDocument::saveHTML() is not consistent for some input strings; specifically '<body>&nbsp;</body>'.

Note that I am using PHP version 5.5.14-1.ius.centos6.x86_64, and not 5.5.15.

Test script:
echo('Initial<pre>'.htmlspecialchars($str1).'</pre>'); //<body>&nbsp;</body>

$dom = new DOMDocument();   //Default is UTF-8, but iso-8859-1 is available if required

$xpath = new DOMXPath($dom);
$body = $dom->getElementsByTagName('body')->item(0);
echo('First Option 1<pre>'.htmlspecialchars($str2).'</pre>'); //<body> </body>

$xpath = new DOMXPath($dom);
$body = $dom->getElementsByTagName('body')->item(0);
echo('Second Option 1<pre>'.htmlspecialchars($str3).'</pre>'); //<body>Â </body>

Expected result:
Initial String
Returned String Pass 1
Returned String Pass 2

Actual result:
Initial String
Returned String Pass 1
<body> </body>
Returned String Pass 2
<body>Â </body>


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2019-12-03 18:01 UTC] coleman at civicrm dot org
This is still a problem in PHP 7.2. Non-breaking space characters become garbled by both the saveHTML and the saveXML functions. Strangely, this depends on how it's represented in the input string fed into loadHTML. Sending it in as a unicode non-break space character triggers the bug, but sending it in as &nbsp; works fine. The trouble is that saveHTML spits it out as the unicode character, so a round-trip through DOMDocument and back again will always result in garbled output. Here is a PHPUnit test to demonstrate the problem:


class myTest extends \PHPUnit\Framework\TestCase {
  public function testNbsp() {
    $runThrough = function($html) {
      $doc = new DOMDocument();
      $doc->loadHTML("<html><body><div id=\"target\">$html</div></body></html>");

      $newHtml = '';
      foreach ($doc->getElementById('target')->childNodes as $node) {
        $newHtml .= $node->ownerDocument->saveXML($node);
      return $newHtml;

    $original = '<p>Hello '."\xc2\xa0".' world</p>';

    $pass1 = $runThrough($original);
    $pass2 = $runThrough($pass1);

    $this->assertEquals($pass1, $pass2);


Note that if we were to extend the test with more iterations, each runThrough will add another extra character to the output.
PHP Copyright © 2001-2021 The PHP Group
All rights reserved.
Last updated: Sun Feb 28 17:01:24 2021 UTC