|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #25707 html_entity_decode over-decodes <
Submitted: 2003-09-30 14:52 UTC Modified: 2003-10-02 02:59 UTC
From: Bjorn dot Victor at it dot uu dot se Assigned: moriyoshi
Status: Closed Package: Strings related
PHP Version: 4.3.3 OS: Solaris 8
Private report: No CVE-ID:
 [2003-09-30 14:52 UTC] Bjorn dot Victor at it dot uu dot se
html_entity_decode(""") returns '"', while the expected value would be """.  Corresponding (wrong) behaviour for & followed by "lt;", "gt;" etc.

Another example is html_entity_decode(htmlentities("&lt;")) which returns "<" rather than "&lt;" as expected.

As a result, html_entity_decode can not be used as the inverse of htmlentities.

The function (php_unescape_html_entities in ext/standard/html.c) replaces each entity in basic_entities with its corresponding character, but starts by replacing "&amp;" with "&", the resulting string being "&quot;", which is then replaced by '"'.

php_unescape_html_entities in ext/standard/html.c traverses the basic_entities from the wrong end; it must replace "&amp;" *last*, not *first*.

Reproduce code:
print html_entity_decode("&amp;quot;&amp;lt;&amp;gt;");

Expected result:

Actual result:


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2003-09-30 15:00 UTC]
(the 2nd optional parameter..)

 [2003-10-01 03:31 UTC] Bjorn dot Victor at it dot uu dot se
Sorry, this is not an RTFM error, and has nothing to do with the optional parameters of the function. I have changed the summary to refer to "lt", to avoid confusion with ENT_QUOTES etc - believe me, I tried this before looking at the source and figuring out what the error really was.

The current code works like this: iterate over the 6 "basic_entities", replace the entity with its character in the string.  "&amp;" is the first item in basic_entities, which is good when you're doing htmlentities (the reverse operation).

Given a string "&amp;lt;", it will first become "&lt;", and then (because "&lt;" is handled after "&amp;"), "<".

Consider doing "&amp;" last, e.g. by traversing basic_entities backwards: 
"&amp;lt;" becomes "&lt;", which is the expected.
 [2003-10-01 17:31 UTC]
html_entity_decode(htmlentities("&lt;")) returns "<", but IMHO it should return the original "&lt;". 

The unhtmlentities() function given on works like it should (in my eyes).
 [2003-10-02 02:59 UTC]
This bug has been fixed in CVS.

In case this was a PHP problem, snapshots of the sources are packaged
every three hours; this change will be in the next snapshot. You can
grab the snapshot at
In case this was a documentation problem, the fix will show up soon at

In case this was a website problem, the change will show
up on the site and on the mirror sites in short time.
Thank you for the report, and for helping us make PHP better.

The fix will be in 4.3.4-rc2.

PHP Copyright © 2001-2015 The PHP Group
All rights reserved.
Last updated: Wed Nov 25 10:02:22 2015 UTC