php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #40017 htmlentities double escaping
Submitted: 2007-01-04 03:46 UTC Modified: 2007-01-04 10:20 UTC
From: thedewi at hotmail dot com Assigned:
Status: Not a bug Package: Strings related
PHP Version: 5.2.0 OS: Debian with bpo backport
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: thedewi at hotmail dot com
New email:
PHP Version: OS:

 

 [2007-01-04 03:46 UTC] thedewi at hotmail dot com
Description:
------------
For some UTF-8 characters, htmlentities() returns...

&#nnnn;

... when it should return ...

&#nnnn;

The problem does not seem to occur in the punctuation or currency subranges, but many subranges are affected including: drawing, greek, cyrillic, hebrew.

It has been partially fixed in the past:

http://bugs.php.net/bug.php?id=11175

There has also been at least one other bug report, filed "Bogus" without a very detailed explanation:

http://bugs.php.net/bug.php?id=27691

Presumably the above was a misunderstanding, but if you get the urge to mark this bug as "Bogus", please explain it, because this behaviour seems to result in unpredictable and ambiguous results (by which I mean that applying a workaround will give false positives).

Reproduce code:
---------------
<?php
	print 'Form:<form action="?" method="post"><input type="text" name="foo" size="80" value="' . htmlentities(stripslashes($_POST['foo']), ENT_QUOTES, 'UTF-8') . '"/></form>Thanks.';
?>


Expected result:
----------------
When I enter text into the input field, it should render a correctly represented and escaped value in the resulting markup:

???&#8486;&#8719;&#9617;&#9786;&#968;&#1069;&#1513; &#1603; 

Note: it seems this bug report form itself suffers from a similar problem, so keep in mind that the above are unicode characters, not HTML escapes.

Actual result:
--------------

&raquo;?&yen;&amp;#8486;&amp;#8719;&amp;#9617;&amp;#9786;&amp;#968;&amp;#1069;&amp;#1513; &amp;#1603; 

As you can see, the first few translations were correct, but the more exotic ones failed.



Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-01-04 10:20 UTC] tony2001@php.net
>For some UTF-8 characters, htmlentities() returns...
>&amp;#nnnn; ... when it should return ...&#nnnn;

As long as htmlentities() receives "&#nnnn;" as its argument, the result would be "&amp;#nnnn;".

 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Fri May 09 13:01:28 2025 UTC