php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #36790 formatOutput fails when document element has text content
Submitted: 2006-03-19 20:58 UTC Modified: 2006-03-20 08:44 UTC
From: exaton at free dot fr Assigned:
Status: Not a bug Package: DOM XML related
PHP Version: 5CVS-2006-03-19 (snap) OS: WinXP SP2
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: exaton at free dot fr
New email:
PHP Version: OS:

 

 [2006-03-19 20:58 UTC] exaton at free dot fr
Description:
------------
It appears that DOM output formatting activated with the DOMDocument's formatOutput property fails as soon as the document's root element ($doc -> documentElement) contains a text node (even besides other nodes).

Reproduce code:
---------------
function pre_ent_dump($var) {
	printf("<pre>\n%s</pre>\n", htmlentities($var));
}
function pre_var_dump($var) {
	echo "<pre>\n"; var_dump($var); echo "</pre>\n";
}	

$doc = new DOMDocument();
$doc -> load('file.xml');
$doc -> formatOutput = TRUE;

pre_ent_dump($doc -> saveXML()); // (1)

$root = $doc -> documentElement;

pre_var_dump($root -> textContent); // (2)

$child = $doc -> createElement('child');
$root -> appendChild($child);

pre_ent_dump($doc -> saveXML()); // (3)

Actual result:
--------------
[Note : this report is only long because of many examples with slight variations]

PHP Version 5.1.3RC2-dev (2006-03-19 15:30)
(rolled out specially to confirm the issue, also occurring in 5.1.2)
Apache2 2.0.55, WinXP SP2

The reproduce code shows what we are doing : simply	 loading the contents of an XML file, dumping those contents, getting the root element, dumping the possible text content of itself and its descendants, adding a child to the root element, and dumping the whole document contents again. Everything depends, therefore, on the contents of file.xml prior to execution.

When file.xml is :

<?xml version="1.0" standalone="yes"?>
<root></root>

The contents output at (1) and (3) are :

<?xml version="1.0" standalone="yes"?>
<root/>

<?xml version="1.0" standalone="yes"?>
<root>
  <child/>
</root>

As expected. But when file.xml is :

<?xml version="1.0" standalone="yes"?>
<root> </root>

Then the contents output at (1) and (3) are :

<?xml version="1.0" standalone="yes"?>
<root> </root>

<?xml version="1.0" standalone="yes"?>
<root> <child/></root>

(1) appears to be as expected, but in (3) the new child is no longer on a new line and indented. That absence of formatting does not occur with just any sort of child to the root element however, as we now demonstrate with the following contents for file.xml :

<?xml version="1.0" standalone="yes"?>
<root><a></a></root>

The contents output at (1) and (3) are :

<?xml version="1.0" standalone="yes"?>
<root>
  <a/>
</root>

<?xml version="1.0" standalone="yes"?>
<root>
  <a/>
  <child/>
</root>

Both as expected. If we add a text node to the root element though (we use whitespace but this seems to apply to any sequence of characters) :

<?xml version="1.0" standalone="yes"?>
<root> <a></a></root>

Then the contents output at (1) and (3) are :

<?xml version="1.0" standalone="yes"?>
<root> <a/></root>

<?xml version="1.0" standalone="yes"?>
<root> <a/><child/></root>

Now we realize that *both* outputs are broken (the <a /> in (1) should be on its own line, like above).
Interestingly, the text nodes only cause trouble when they are at the root. When the contents of file.xml are :

<?xml version="1.0" standalone="yes"?>
<root><a>
</a></root>

Then the contents output at (1) and (3) are :

<?xml version="1.0" standalone="yes"?>
<root>
  <a>
</a>
</root>

<?xml version="1.0" standalone="yes"?>
<root>
  <a>
</a>
  <child/>
</root>

As they should be.

I have tried working around by loading with loadXML(file_get_contents('file.xml')), by going to and fro with SimpleXML, by cancelling and setting formatOutput again at different stages... Obviously the problem is not there.

I do not know if or how this is related to bugs 23726 or 35673 ; it seems to me I have more details here, but I won't be surprised if you tell me the problem is at the libxml2 level...

Thanks is advance.

P.S. Side note -- nothing to do here, but the bug report CAPTCHA image does not seem to work under Firefox 1.5 win32 (works fine in IE), nor does it accept the CAPTCHA if I get the source, view the image in IE, and type it in Firefox.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-03-19 23:32 UTC] rrichards@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

First the formattting is handled by libxml2 so not a DOM issue. Second, you are playing with an element containing mixed content so formatting is different than content containing only child elements or text content.
 [2006-03-20 08:44 UTC] exaton at free dot fr
Thanks for answering so quickly. I feel, however, that in minimizing the code for the report I may not have presented the case fully. I'm leaving the Bogus status though, as I do understand that this is beyond PHP itself.

There is a real transitivity problem :

// Load XML file that will not cause any formatting problems

$doc = new DOMDocument();
$doc -> load('file.xml');
$doc -> formatOutput = TRUE;

// Add a child to its root element

$root = $doc -> documentElement;
$child = $doc -> createElement('child');
$root -> appendChild($child);

// Check XML that will be saved to file (it is fine, well formatted) and save it

pre_ent_dump($doc -> saveXML()); (1)
$doc -> save('fileResult.xml');

// Open file saved with good formatting

$docRes = new DOMDocument();
$docRes -> load('fileResult.xml');
$docRes -> formatOutput = TRUE;

// Check XML of that file... Its good formatting meant
// that there were newlines and tabs, which cause problems.
// This formatting is OK for now (surprisingly...)

pre_ent_dump($docRes -> saveXML()); (2)

// Add another child to its root element

$root = $docRes -> documentElement;
$child = $docRes -> createElement('child');
$root -> appendChild($child);

// Check XML again -- this time it really is broken

pre_ent_dump($docRes -> saveXML()); (3)

-----

With the non-problematic contents of file.xml being :

<?xml version="1.0" standalone="yes"?>
<root></root>

Then the good contents at (1) that are saved to fileResult.xml are :

<?xml version="1.0" standalone="yes"?>
<root>
  <child/>
</root>

But even though, after re-opening that properly formatted file, the contents at (2) are not yet broken, in (3), after adding a new child, they are :

<?xml version="1.0" standalone="yes"?>
<root>
  <child/>
<child/></root>

So as you see, it is not possible to use output formatting throughout : it should have to be turned off when saving to file and only turned on for debug purposes... Which is fine in a theoretical sense, but in fact the files thus produced are pretty illegible, which kills a good part of the purpose of XML in the first place :-/

Do you think it would be worth my pointing this out to someone on the libxml2 team ?

Thanks again.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Apr 25 15:01:30 2024 UTC