php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #29119 html_entity_decode misbehaves with UTF-8
Submitted: 2004-07-13 16:53 UTC Modified: 2004-07-19 19:54 UTC
From: peter at desk dot nl Assigned: moriyoshi (profile)
Status: Closed Package: Strings related
PHP Version: 5.0.0RC3 OS: Linux 2.6.5
Private report: No CVE-ID: None
 [2004-07-13 16:53 UTC] peter at desk dot nl
Description:
------------
when decoding named entities with html_entity_decode, the resulting characters are sometimes incorrect. e.g., "€" becomes the UTF-8 representation of the ⁄ entity (fraction slash). not all entities are incorrectly translated, though. numerical entities work correctly.

tested with both php-5.0.0RC3 and the proposed -RC4 from snaps.php.net/~andi/

config :

./configure  --with-apxs=/usr/bin/apxs --with-mysql=/usr/local/mysql --with-gd --enable-safe-mode --with-dom=/usr --enable-ftp --with-zlib --with-xsl --with-xmlrpc --enable-cli --enable-bcmath --with-iconv --with-jpeg-dir=/usr --with-png-dir=/usr --with-xpm-dir=/usr/X11R6



Reproduce code:
---------------
// all tests are viewed with browser encoding forced on UTF-8.
<?php
  print html_entity_decode("&euro;",ENT_NOQUOTES,"UTF-8");
  print "<br />";
  print htmlentities(html_entity_decode("&euro;",ENT_NOQUOTES,"UTF-8"),ENT_NOQUOTES,"UTF-8");
?>

// other test also available on 
// http://fire.desk.nl/decode/index.php
// (just enter &euro; and submit)
// and http://fire.desk.nl/decode/index.phps (source)

Expected result:
----------------
a euro-sign
a euro-sign


Actual result:
--------------
a fractional slash
a euro-sign


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2004-07-19 14:26 UTC] peter at desk dot nl
after looking at the sources, this patch seems to fix it :

fire:/home/peter/php-5.0.0/ext/standard# diff -u html.c.org html.c
--- html.c.org  2004-07-19 14:24:08.000000000 +0200
+++ html.c      2004-07-19 14:13:50.000000000 +0200
@@ -158,7 +158,7 @@
        "thinsp", NULL, NULL, "zwnj", "zwj", "lrm", "rlm",
        NULL, NULL, NULL, "ndash", "mdash", NULL, NULL, NULL,
        "lsquo", "rsquo", "sbquo", NULL, "ldquo", "rdquo", "bdquo",
-       "dagger", "Dagger",     "bull", NULL, NULL, NULL, "hellip",
+       NULL,"dagger", "Dagger",        "bull", NULL, NULL, NULL, "hellip",
        NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, "permil", NULL,
        "prime", "Prime", NULL, NULL, NULL, NULL, NULL, "lsaquo", "rsaquo",
        NULL, NULL, NULL, "oline", NULL, NULL, NULL, NULL, NULL,
 [2004-07-19 15:17 UTC] peter at desk dot nl
nb : the above-mentioned online tests at http://fire.desk.nl/ are now no longer relevant as that machine runs the patched version.
 [2004-07-19 19:54 UTC] moriyoshi@php.net
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.


 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Mar 19 07:01:29 2024 UTC