php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #36738 Encoding with accents using htmlentities
Submitted: 2006-03-14 23:07 UTC Modified: 2006-03-15 14:36 UTC
From: yanick dot rochon at gmail dot com Assigned:
Status: Not a bug Package: DOM XML related
PHP Version: 5.1.2 OS: Windows XP
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: yanick dot rochon at gmail dot com
New email:
PHP Version: OS:

 

 [2006-03-14 23:07 UTC] yanick dot rochon at gmail dot com
Description:
------------
When using letter with accents (ie: '?', '?', '?', etc.) there is no way to use the htmlentities() function or any form of special character encoding correctly. Or the browser (mostly IE) will refuse to read the XML (FF is just fine) or, if using htmlentities(), the '&' in 'é' will because 'é' which is not desired.

Reproduce code:
---------------
$doc = new DOMDocument('1.0','UTF-8');

$node1 = $doc->createElement('node');
$node1->appendChild( new DOMText( 'cl?' ) );
$node2 = $doc->createElement('node');
$node2->appendChild( new DOMText( htmlentities('cl?') ) );

$root = $doc->createElement('xml');
$root->appendChild( $node );

$doc->appendChild( $root );

echo $doc->saveXML();

Expected result:
----------------
<?xml version="1.0" encoding="utf-8" ?>
<xml>
<node>?</node>
<node>&eacute;</node>
</xml>


Actual result:
--------------
<?xml version="1.0" encoding="utf-8" ?>
<xml>
<node>?</node>
<node>&amp;eacute;</node>
</xml>


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-03-15 10:48 UTC] edink@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php


 [2006-03-15 14:36 UTC] yanick dot rochon at gmail dot com
Well, I'm sorry if you guys think that this is not a bug, but it is still a problem to me. I've read many forums talking about this issue, and all the possible solution they came up with was to play with char encodings... What I fail to see is, why htmlentities('<cl?>'); would output '&lt;cl&eacute;&gt;' and new DOMText( '<cl?>' ); would output '&lt;cl?&gt;' ? Since there's no option to prevent the text in being escaped, I had to fall back into raw XML editing (strings instead of DOM), making the whole DOM package obsolete to me. Could there be a mechanism to prevent saveXML to escape the text nodes ? So people can actually control what is being escaped, and what's not ?

If this is just not possible, please let me know, because I don't see why it couldn't be implemented.

Thanks.

P.s. I did read the rules for posting bugs and reports. And I was pretty sure I did it right.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu May 16 18:01:34 2024 UTC