|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #30975 PHP DOM functions output UTF-8 encoded regardless of input encoding
Submitted: 2004-12-03 13:24 UTC Modified: 2004-12-03 13:36 UTC
From: justin at jwd dot co dot uk Assigned:
Status: Not a bug Package: XML related
PHP Version: 5.0.2 OS: Windows XP
Private report: No CVE-ID: None
 [2004-12-03 13:24 UTC] justin at jwd dot co dot uk
When retrieving sections of text from an HTML page using the new DOM functions, the output is encoded using UTF-8 despite the input being correctly detected as encoded ISO-8859-1. This means extra code in order to convert back to the original charset of the input text. Surely the DOM functions should either encode according to the detected input encoding or at least provide some mechanism for setting the output encoding? Or am I being stupid here?

Reproduce code:
$xhtml= <<<HTML_END
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "">
<html xmlns="">
<head><title>Untitled Document</title>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /></head>
<body><p class="test_paragraph">Test&nbsp;Paragraph</p></body>

$in=new DomDocument();
$xin=new DomXpath($in);


echo(htmlspecialchars($text)."\n"); // Outputs "Test? Paragraph"

$text=iconv("UTF-8", "ISO-8859-1", $text);
echo(htmlspecialchars($text)."\n"); // Outputs "Test Paragraph"

Expected result:
Test Paragraph
Test Paragraph

Actual result:
Test? Paragraph
Test Paragraph


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2004-12-03 13:36 UTC]
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at and the instructions on how to report
a bug at

This is indeed expected, all XML extensions in PHP work internally with UTF-8 so that\'s what it returns.
PHP Copyright © 2001-2023 The PHP Group
All rights reserved.
Last updated: Sun Apr 02 13:03:41 2023 UTC