| 
        php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login | 
  [2007-07-12 19:24 UTC] borys dot forytarz at gmail dot com
 Description:
------------
There is a problem with DOM and encoding. I have two separate files, one full XHTML code (DTD, head, meta, body and more contents) saved in UTF-8. Meta declaration is UTF-8, server sends the code in UTF-8 too. The second file is a simple file without any DTD, head, meta and body. Saved in UTF-8 too. The problem is, when I import nodes from the second file using importNode(), in the output there are invalid encoded characters (those who were declared in the second file). It is strange because as I read, DOM works in UTF-8 so there should be not such a problem.
What is more, I was debugging the properties such as actualEncoding and they shown me that there is UTF-8...
If it's not a bug, but I think it is, how to fix that? I can't declare in the second file DTD, head and body elements.
Reproduce code:
---------------
$this->dom = new DOMDocument('1.0','UTF-8');
$this->dom->encoding = 'UTF-8';
$this->dom->formatOutput = self::$formatOutput;
$this->dom->preserveWhiteSpace = self::$preserveWhiteSpace;
@$this->dom->loadHtmlFile($html);
...
echo $this->dom->saveXML();
The above works well for the complete XHTML file. But when I load an incomplete file (encoded in UTF-8) I don't see properly encoded characters when I import nodes from the second document to the first one.
I tried to convert the whole output with iconv() and mb_convert_encoding() but it seems not to make any difference at all.
Expected result:
----------------
Properly encoded characters from both complete XHTML file and second "poor" file. The second file is such as follows:
<content id="something">
   <h1>some string</h1>
</content>
Actual result:
--------------
Not properly encoded characters from between <content> tag.
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits             
             | 
    |||||||||||||||||||||||||||
            
                 
                Copyright © 2001-2025 The PHP GroupAll rights reserved.  | 
        Last updated: Mon Nov 03 16:00:02 2025 UTC | 
there should be: ... foreach($content->childNodes as $child) { ... sorryI have checked about files encodings. mb_detect_encoding() returns, that they are ASCII-encoded (!?). So I wrote a simple script to convert them to utf-8: <?php $cont = file_get_contents('login.php.tpl'); $f = fopen('login.php.tpl','w'); echo "\n".mb_detect_encoding('login.php.tpl').' > '; fwrite($f,mb_convert_encoding($cont,'utf-8')); echo mb_detect_encoding('login.php.tpl')."\n"; fclose($f); ?> and the output is: ASCII > ASCII (I expected ASCII > UTF-8) result of using iconv instead of mb_convert_encoding is the same what's going on?