php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #40017 htmlentities double escaping
Submitted: 2007-01-04 03:46 UTC Modified: 2007-01-04 10:20 UTC
From: thedewi at hotmail dot com Assigned:
Status: Not a bug Package: Strings related
PHP Version: 5.2.0 OS: Debian with bpo backport
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: thedewi at hotmail dot com
New email:
PHP Version: OS:

 

 [2007-01-04 03:46 UTC] thedewi at hotmail dot com
Description:
------------
For some UTF-8 characters, htmlentities() returns...

&#nnnn;

... when it should return ...

&#nnnn;

The problem does not seem to occur in the punctuation or currency subranges, but many subranges are affected including: drawing, greek, cyrillic, hebrew.

It has been partially fixed in the past:

http://bugs.php.net/bug.php?id=11175

There has also been at least one other bug report, filed "Bogus" without a very detailed explanation:

http://bugs.php.net/bug.php?id=27691

Presumably the above was a misunderstanding, but if you get the urge to mark this bug as "Bogus", please explain it, because this behaviour seems to result in unpredictable and ambiguous results (by which I mean that applying a workaround will give false positives).

Reproduce code:
---------------
<?php
	print 'Form:<form action="?" method="post"><input type="text" name="foo" size="80" value="' . htmlentities(stripslashes($_POST['foo']), ENT_QUOTES, 'UTF-8') . '"/></form>Thanks.';
?>


Expected result:
----------------
When I enter text into the input field, it should render a correctly represented and escaped value in the resulting markup:

???&#8486;&#8719;&#9617;&#9786;&#968;&#1069;&#1513; &#1603; 

Note: it seems this bug report form itself suffers from a similar problem, so keep in mind that the above are unicode characters, not HTML escapes.

Actual result:
--------------

&raquo;?&yen;&amp;#8486;&amp;#8719;&amp;#9617;&amp;#9786;&amp;#968;&amp;#1069;&amp;#1513; &amp;#1603; 

As you can see, the first few translations were correct, but the more exotic ones failed.



Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-01-04 10:20 UTC] tony2001@php.net
>For some UTF-8 characters, htmlentities() returns...
>&amp;#nnnn; ... when it should return ...&#nnnn;

As long as htmlentities() receives "&#nnnn;" as its argument, the result would be "&amp;#nnnn;".

 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Mar 28 21:01:27 2024 UTC