|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #30975 PHP DOM functions output UTF-8 encoded regardless of input encoding
Submitted: 2004-12-03 13:24 UTC Modified: 2004-12-03 13:36 UTC
From: justin at jwd dot co dot uk Assigned:
Status: Not a bug Package: XML related
PHP Version: 5.0.2 OS: Windows XP
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
Block user comment
Status: Assign to:
Bug Type:
From: justin at jwd dot co dot uk
New email:
PHP Version: OS:


 [2004-12-03 13:24 UTC] justin at jwd dot co dot uk
When retrieving sections of text from an HTML page using the new DOM functions, the output is encoded using UTF-8 despite the input being correctly detected as encoded ISO-8859-1. This means extra code in order to convert back to the original charset of the input text. Surely the DOM functions should either encode according to the detected input encoding or at least provide some mechanism for setting the output encoding? Or am I being stupid here?

Reproduce code:
$xhtml= <<<HTML_END
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "">
<html xmlns="">
<head><title>Untitled Document</title>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /></head>
<body><p class="test_paragraph">Test&nbsp;Paragraph</p></body>

$in=new DomDocument();
$xin=new DomXpath($in);


echo(htmlspecialchars($text)."\n"); // Outputs "Test? Paragraph"

$text=iconv("UTF-8", "ISO-8859-1", $text);
echo(htmlspecialchars($text)."\n"); // Outputs "Test Paragraph"

Expected result:
Test Paragraph
Test Paragraph

Actual result:
Test? Paragraph
Test Paragraph


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2004-12-03 13:36 UTC]
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at and the instructions on how to report
a bug at

This is indeed expected, all XML extensions in PHP work internally with UTF-8 so that\'s what it returns.
PHP Copyright © 2001-2023 The PHP Group
All rights reserved.
Last updated: Wed Mar 22 04:03:45 2023 UTC