php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #76285 DOMDocument::formatOutput attribute sometimes ignored
Submitted: 2018-04-29 06:23 UTC Modified: 2018-06-28 09:43 UTC
From: daniel dot hemberger at gmail dot com Assigned: ab (profile)
Status: Closed Package: DOM XML related
PHP Version: Irrelevant OS:
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: daniel dot hemberger at gmail dot com
New email:
PHP Version: OS:

 

 [2018-04-29 06:23 UTC] daniel dot hemberger at gmail dot com
Description:
------------
When using `DOMDocument::saveHTML ([ DOMNode $node = NULL ] )`, whether or not the `formatOutput` attribute of the class instance is respected depends on whether or not the `$node` argument is specified when calling the method.

We can see where the problem arises in the source code (here: https://github.com/php/php-src/blob/701437a94872853026c97b225f9882ed7fa3164c/ext/dom/document.c). If `nodep == NULL` (i.e. `$node` is NOT passed to `saveHTML`) then the following function call is made in `dom_document_save_html`:

htmlDocDumpMemoryFormat(docp, &mem, &size, format);

But if `nodep != NULL` (i.e. `$node` IS passed to `saveHTML`), it calls:

htmlNodeDump(buf, docp, node);

We can see already that the `format` variable is not passed into `htmlNodeDump`, and thus the output cannot respect the `DOMDocument::formatOutput` attribute.

This should be fairly simple to fix, because there is a similar function `htmlNodeDumpFormatOutput` which does take an argument specifying the formatting. I think that the problem is solved by replacing the instances of `htmlNodeDump` with `htmlNodeDumpFormatOutput` and passing the appropriate formatting arguments (see: http://xmlsoft.org/html/libxml-HTMLtree.html#htmlNodeDumpFormatOutput).

Thank you for your time!

Test script:
---------------
<?php
$dom = new DOMDocument();
$dom->formatOutput = false;
$html = '<div><div><a>test</a></div><div><a>test2</a></div></div>';
$dom->loadHTML($html);
$rootNode = $dom->documentElement;
foreach ($rootNode->firstChild->childNodes as $child) {
  var_dump($dom->saveHTML($child));
  $dom2 = new DOMDocument();
  $dom2->formatOutput = false;
  $child2 = $dom2->importNode($child, true);
  $dom2->appendChild($child2);
  echo "\n";
  var_dump($dom2->saveHTML());
}

Expected result:
----------------
string(57) "<div><div><a>test</a></div><div><a>test2</a></div></div>
"

string(57) "<div><div><a>test</a></div><div><a>test2</a></div></div>
"

Actual result:
--------------
string(59) "<div>
<div><a>test</a></div>
<div><a>test2</a></div>
</div>"

string(57) "<div><div><a>test</a></div><div><a>test2</a></div></div>
"

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-05-01 12:08 UTC] andrew dot nester dot dev at gmail dot com
Thanks for reporting the issue!

I added PR fixing this: https://github.com/php/php-src/pull/3229
 [2018-06-28 09:43 UTC] ab@php.net
-Status: Open +Status: Closed -Assigned To: +Assigned To: ab
 [2018-06-28 09:43 UTC] ab@php.net
The linked PR has been merged.

Thanks.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 14:01:29 2024 UTC