php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #52712 html_entity_decode does not support all standard entities
Submitted: 2010-08-27 06:01 UTC Modified: 2010-08-27 06:17 UTC
From: matias dot perrone at gmail dot com Assigned:
Status: Not a bug Package: Strings related
PHP Version: 5.2.14 OS: Windows 7
Private report: No CVE-ID: None
 [2010-08-27 06:01 UTC] matias dot perrone at gmail dot com
Description:
------------
The function "html_entity_decode" does not support all html entities as documented 
in http://www.w3.org/TR/html4/sgml/entities.html


Test script:
---------------
$sEntities = '’ ‘ “ ” € ˆ';
echo "Start: ".$sEntities."\n";
$sEntities = html_entity_decode(($sEntities), ENT_QUOTES, "ISO-8859-1");
echo "Result: ".$sEntities;

Expected result:
----------------
Start: ’ ‘ “ ” € ˆ
Result: ’ ‘ “ ” € ˆ


Actual result:
--------------
Start: ’ ‘ “ ” € ˆ
Result: ’ ‘ “ ” € ˆ

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2010-08-27 06:17 UTC] aharvey@php.net
-Status: Open +Status: Bogus
 [2010-08-27 06:17 UTC] aharvey@php.net
html_entity_decode() can only decode entities that exist in the given
character set. None of your example entities occur in ISO-8859-1,
therefore they have to be left as entities. To see this in action: if
you change the character set to ISO-8859-15, the € entity does get
correctly decoded, since ISO-8859-15 added the € character to
ISO-8859-1.

You'd be much better off using a Unicode character set like UTF-8,
since that can represent all of the characters defined by HTML
entities.

Not a bug; closing.
 
PHP Copyright © 2001-2020 The PHP Group
All rights reserved.
Last updated: Thu Feb 27 08:01:27 2020 UTC