|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #77769 html_entity_decode does not decode all HTML5 entities
Submitted: 2019-03-19 20:10 UTC Modified: 2019-03-19 21:05 UTC
Avg. Score:3.0 ± 0.0
Reproduced:0 of 0 (0.0%)
From: cananian at wikimedia dot org Assigned:
Status: Open Package: Strings related
PHP Version: 7.3.3 OS: n/a
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2019-03-19 20:10 UTC] cananian at wikimedia dot org
The latest HTML5 specs contain a number of "semicolon-less" entities which are decoded in most circumstances.  See the list at (just the ones which don't end in a semicolon).

These are decoded *except* when found in an attribute and the letter after the entity is an equals sign or an ASCII alphanumeric; see

I propose two new option flags for html_entity_decode:

ENT_HTML5_NOATTRIBUTE -- decodes all the semicolon-less entities in addition to the other HTML5 entities
ENT_HTML5_ATTRIBUTE -- decodes semicolon-less entities except when they are followed by an equals sign or ASCII alphanumeric

This would allow authors to easily decode these legacy semicolon-less entities in the same way a browser would.

Test script:
$ psysh 
Psy Shell v0.9.9 (PHP 7.3.2-3 — cli) by Justin Hileman
>>> html_entity_decode('&ampfoo', ENT_HTML5)
=> "&ampfoo"

In a browser web console:
> document.body.innerHTML


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2019-03-19 21:05 UTC]
They are decoded for graceful handling. &ampfoo is still a parse error.
PHP Copyright © 2001-2019 The PHP Group
All rights reserved.
Last updated: Fri Apr 26 15:01:25 2019 UTC