php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #36407 Encoding in xsl:output has no effect
Submitted: 2006-02-16 02:10 UTC Modified: 2006-02-16 14:30 UTC
From: memoimyself at yahoo dot com dot br Assigned:
Status: Not a bug Package: XSLT related
PHP Version: 5.1.2 OS: Windows XP
Private report: No CVE-ID: None
 [2006-02-16 02:10 UTC] memoimyself at yahoo dot com dot br
Description:
------------
I have an XSL file to transform XML documents; the XSL and all XML files are encoded in UTF-8. My PHP script is in a file also encoded in UTF-8. The data is retrieved from a database (MySQL 5) whose character set is UTF-8 and whose tables all have UTF-8 as their character set as well. All the strings in all the tables are duly encoded in UTF-8. Prior to data retrieval, the query 'SET NAMES "utf8"' is run to ensure that all i/o operations use the same character set (UTF-8). (Hope I've been thorough enough!)

Now, my XSL file has the following top-level (child node of the document element, as it should be) element:

<xsl:output encoding="utf-8" method="html"/>

The transformation is performed by the transformToXML method of the XSLTProcessor class.

When the code is run on a Windows server (Win XP, Apache 2.0.55,PHP 5.1.2), the result of the transformation is NEVER UTF-8 (always iso-8859-1), even if I chage 'method' to 'xml', and even if I use a 'media-type' attribute.

When run on a Linux server (PHP 5.1.2 as well), everything goes well.

Reproduce code:
---------------
$xml = new DOMDocument('1.0', 'UTF-8');
$xml->loadXML($row->Report);

$xsl = new DOMDocument('1.0', 'UTF-8');
$xsl->load($xsl_file);

$proc = new XSLTProcessor;
$proc->importStyleSheet($xsl);

$report = $proc->transformToXML($xml);

Expected result:
----------------
Output string should be encoded in UTF-8.

Actual result:
--------------
Output string in NOT encoded in UTF-8 and accented characters appear garbled. This problem only occurs under Windows.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-02-16 11:15 UTC] rrichards@php.net
Not enough information was provided for us to be able
to handle this bug. Please re-read the instructions at
http://bugs.php.net/how-to-report.php

If you can provide more information, feel free to add it
to this bug and change the status back to "Open".

Thank you for your interest in PHP.


Need xml doc and stylesheet you are using
 [2006-02-16 13:16 UTC] chregu@php.net
Please do not submit the same bug more than once. An existing
bug report already describes this very problem. Even if you feel
that your issue is somewhat different, the resolution is likely
to be the same. 

Thank you for your interest in PHP.

duped #36415
 [2006-02-16 14:00 UTC] memoimyself at yahoo dot com dot br
This is NOT a bogus report. I explained in as much detail as possible what happens and yet I was reprimanded with "not enough information was provided for us to be able
to handle this bug". What other information is there to provide?

This is a script that runs on a test server that cannot be acessed over the internet, so I cannot provide you with a link to the script.

According to your rules, I cannot submit more than 20 lines of code either, so I cannot reproduce a typical XML document or the whole XSL file, or even the complete PHP script, so I tried sending the bits that really matter (in my humble opinion).

If you want to reproduce the problem, try this: 

(a) Transform ANY XML document encoded in UTF-8 with any XSLT stylesheet also encoded in UTF-8 with the tag <xsl:output encoding="utf-8"/>;

(b) Output the result of the transformation, preceded by a content-type header specifying UTF-8 as the encoding, to a browser.

I'll bet you anything you want that the output will NOT be UTF-8 as it should.

If you prefer to disregard this report, suit yourself. I'm just trying to help. I guess we'll all be worse off if the bug is not fixed. But what can I do?
 [2006-02-16 14:30 UTC] memoimyself at yahoo dot com dot br
Important detail I failed to include in my previous comment: the string for the XML document has to be retrieved from a MySQL database, preferably version 5, where all character sets (database and tables) are UTF-8.

I considered the possibility that the bug was with PDO, not XSLTProcessor, but when the same XML strings are retrieved from the same table with PDO (which is what is done prior to XSLT transformation in my report) and the output to the browser, the output IS proper UTF-8.

A theory is that, somehow, when the string retrieved from the database is imported into the XML file that will later be processed by XSLTProcessor, it gets re-encoded in such a way that the output of the transformation will not be proper UTF-8. That to me is still either an XSLTProcessor or a DOMDocument problem.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 19 01:01:28 2024 UTC