php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #76285 DOMDocument::formatOutput attribute sometimes ignored
Submitted: 2018-04-29 06:23 UTC Modified: 2018-06-28 09:43 UTC
From: daniel dot hemberger at gmail dot com Assigned: ab (profile)
Status: Closed Package: DOM XML related
PHP Version: Irrelevant OS:
Private report: No CVE-ID: None
 [2018-04-29 06:23 UTC] daniel dot hemberger at gmail dot com
Description:
------------
When using `DOMDocument::saveHTML ([ DOMNode $node = NULL ] )`, whether or not the `formatOutput` attribute of the class instance is respected depends on whether or not the `$node` argument is specified when calling the method.

We can see where the problem arises in the source code (here: https://github.com/php/php-src/blob/701437a94872853026c97b225f9882ed7fa3164c/ext/dom/document.c). If `nodep == NULL` (i.e. `$node` is NOT passed to `saveHTML`) then the following function call is made in `dom_document_save_html`:

htmlDocDumpMemoryFormat(docp, &mem, &size, format);

But if `nodep != NULL` (i.e. `$node` IS passed to `saveHTML`), it calls:

htmlNodeDump(buf, docp, node);

We can see already that the `format` variable is not passed into `htmlNodeDump`, and thus the output cannot respect the `DOMDocument::formatOutput` attribute.

This should be fairly simple to fix, because there is a similar function `htmlNodeDumpFormatOutput` which does take an argument specifying the formatting. I think that the problem is solved by replacing the instances of `htmlNodeDump` with `htmlNodeDumpFormatOutput` and passing the appropriate formatting arguments (see: http://xmlsoft.org/html/libxml-HTMLtree.html#htmlNodeDumpFormatOutput).

Thank you for your time!

Test script:
---------------
<?php
$dom = new DOMDocument();
$dom->formatOutput = false;
$html = '<div><div><a>test</a></div><div><a>test2</a></div></div>';
$dom->loadHTML($html);
$rootNode = $dom->documentElement;
foreach ($rootNode->firstChild->childNodes as $child) {
  var_dump($dom->saveHTML($child));
  $dom2 = new DOMDocument();
  $dom2->formatOutput = false;
  $child2 = $dom2->importNode($child, true);
  $dom2->appendChild($child2);
  echo "\n";
  var_dump($dom2->saveHTML());
}

Expected result:
----------------
string(57) "<div><div><a>test</a></div><div><a>test2</a></div></div>
"

string(57) "<div><div><a>test</a></div><div><a>test2</a></div></div>
"

Actual result:
--------------
string(59) "<div>
<div><a>test</a></div>
<div><a>test2</a></div>
</div>"

string(57) "<div><div><a>test</a></div><div><a>test2</a></div></div>
"

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-05-01 12:08 UTC] andrew dot nester dot dev at gmail dot com
Thanks for reporting the issue!

I added PR fixing this: https://github.com/php/php-src/pull/3229
 [2018-06-28 09:43 UTC] ab@php.net
-Status: Open +Status: Closed -Assigned To: +Assigned To: ab
 [2018-06-28 09:43 UTC] ab@php.net
The linked PR has been merged.

Thanks.
 
PHP Copyright © 2001-2018 The PHP Group
All rights reserved.
Last updated: Thu Dec 13 22:01:26 2018 UTC