php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #36415 transformToXML will NOT output UTF-8
Submitted: 2006-02-16 12:59 UTC Modified: 2006-02-24 01:00 UTC
Votes:13
Avg. Score:4.6 ± 0.6
Reproduced:11 of 11 (100.0%)
Same Version:4 (36.4%)
Same OS:5 (45.5%)
From: memoimyself at yahoo dot com dot br Assigned:
Status: No Feedback Package: XSLT related
PHP Version: 5.1.2 OS: Windows XP
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: memoimyself at yahoo dot com dot br
New email:
PHP Version: OS:

 

 [2006-02-16 12:59 UTC] memoimyself at yahoo dot com dot br
Description:
------------
An instance of the XSLTProcessor class, via its transformToXML method, is used to transform XML documents using an XSL stylesheet.

The XSL document is in a file that is encoded in UTF-8. 

The PHP script is in a file also encoded in UTF-8.

The XML documents are created at run time from XML strings stored in a MySQL 5 database whose character set is
UTF-8 and whose tables all have UTF-8 as their character set as well.

All the XML strings stored in the database are duly encoded in UTF-8.

Prior to data retrieval, a 'SET NAMES "utf8"' query is run to ensure that all i/o operations use the UTF-8 character set.

Upon transformation, the results are output to the client preceded by "header('Content-Type: text/html; charset=UTF-8')" to ensure that the browser uses the correct character set.

The XSL file has the following top-level (child node of the document element, as it should be) element:

<xsl:output encoding="utf-8" method="html"/>

When this code is run on a Windows server (Win XP, Apache 2.0.55,PHP 5.1.2), the transformation NEVER outputs UTF-8 text (seems to output iso-8859-1), even if the 'method' attribute in the above element is changed to 'xml', and even if a 'media-type' attribute is also used.

When run on a Linux server (also running PHP 5.1.2), the transformation runs as expected and outputs proper UTF-8 text to the browser.

Reproduce code:
---------------
PHP code:

$dbo = new PDO(BD_DSN, BD_USERNAME, BD_PWD);
$dbo->query('SET NAMES "utf8"');
$sql = 'SELECT Report FROM reports WHERE Id = '.$dbo->quote(strip_gpc_slashes($_GET['rid'])).' 
AND Author = '.$dbo->quote($_SESSION['user']);
$result = $dbo->query($sql);
$row = $result->fetch(PDO::FETCH_OBJ);
$xml = new DOMDocument('1.0', 'UTF-8');
$xml->loadXML($row->Report);
$xsl = new DOMDocument('1.0', 'UTF-8');
$xsl->load('/path/to/xsl/file.xsl');
$proc = new XSLTProcessor;
$proc->importStyleSheet($xsl);
$output = $proc->transformToXML($xml);
header('Content-Type: text/html; charset=utf-8');
print $output;

Start of XSL document:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2005/11/schema-for-xslt20.xsd">
	<xsl:output encoding="utf-8" method="html"/>
	<xsl:template match="/">...

Expected result:
----------------
All text output to the browser should be proper UTF-8. If the browser's character encoding is set to UTF-8 (which it should, with the "content-type" header above), all accented character should be adequately displayed.

Actual result:
--------------
When the code is run on a Windows XP server, the text output to the browser is NOT proper UTF-8 and all accented characters are replaced by weird symbols.

When the code is run on a Linux server (also equipped with PHP 5.1.2), everything works as expected and the output is proper UTF-8.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-02-16 13:17 UTC] chregu@php.net
Thank you for this bug report. To properly diagnose the problem, we
need a short but complete example script to be able to reproduce
this bug ourselves. 

A proper reproducing script starts with <?php and ends with ?>,
is max. 10-20 lines long and does not require any external 
resources such as databases, etc.

If possible, make the script source available online and provide
an URL to it here. Try to avoid embedding huge scripts into the report.

This is not a reproducable script.... 
We need something, we can copy&paste and run. 

And please do not open 2 reports for the same problem
 [2006-02-24 01:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 [2006-03-21 19:15 UTC] vodka_carambar_lovely_spam at yahoo dot com
I got the same problem with a similar setup. The only difference was the OS (Ubuntu). Same PHP version and Apache version. Double check that the content of what is going into transformToXML really is valid UTF-8. The slighest encoding slip-up anywhere in the code will screw everything up.

For myself I had to open a file that was supposed to be UTF-8, copy paste it into another document and "save as" with UTF-8 encoding.
 [2008-02-20 16:43 UTC] jona at oismail dot com
Sample PHP Script that reproduces the bug for ISO-8859-15.
If the script is changed to use UTF-8 encoding and the Danish characters are run through utf8_encode(), ? ? ? ? will display fine but ? ? will not.

<?php
// XML Document with Danish Characters
$xml = '<?xml version="1.0" encoding="ISO-8859-15"?>';
$xml .= '<root>? ? ? ? ? ?</root>';
// XSL Stylesheet
$xsl = '<?xml version="1.0" encoding="ISO-8859-15"?>';
$xsl .= '<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">';
$xsl .= '<xsl:output method="xml" version="1.0" encoding="ISO-8859-15" indent="yes" media-type="application/xhtml+xml" doctype-public="-//WAPFORUM//DTD XHTML Mobile 1.0//EN" doctype-system="http://www.openmobilealliance.org/DTD/xhtml-mobile10.dtd" omit-xml-declaration="no" />';
$xsl .= '<xsl:template match="/">';
$xsl .= '<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="da">';
$xsl .= '<head>';
$xsl .= '<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=ISO-8859-15" />';
$xsl .= '</head>';
$xsl .= '<body>';
$xsl .= '<xsl:value-of select="/root" />';
$xsl .= '</body>';
$xsl .= '</html>';
$xsl .= '</xsl:template>';
$xsl .= '</xsl:stylesheet>';

// Load XSL Stylesheet into memory
$obj_XML = new DOMDocument("1.0", "ISO-8859-15");
$obj_XSLT = new XSLTProcessor();
$obj_XML->loadXML($xsl);
$obj_XSLT->importStylesheet($obj_XML);

// Load XML document into memory
$obj_XML = new DOMDocument("1.0", "ISO-8859-15");
$obj_XML->loadXML($xml);

// Perform transformation
echo $obj_XSLT->transformToXml($obj_XML);
?>
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Dec 27 07:01:28 2024 UTC