php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #29119 html_entity_decode misbehaves with UTF-8
Submitted: 2004-07-13 16:53 UTC Modified: 2004-07-19 19:54 UTC
From: peter at desk dot nl Assigned: moriyoshi (profile)
Status: Closed Package: Strings related
PHP Version: 5.0.0RC3 OS: Linux 2.6.5
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: peter at desk dot nl
New email:
PHP Version: OS:

 

 [2004-07-13 16:53 UTC] peter at desk dot nl
Description:
------------
when decoding named entities with html_entity_decode, the resulting characters are sometimes incorrect. e.g., "€" becomes the UTF-8 representation of the ⁄ entity (fraction slash). not all entities are incorrectly translated, though. numerical entities work correctly.

tested with both php-5.0.0RC3 and the proposed -RC4 from snaps.php.net/~andi/

config :

./configure  --with-apxs=/usr/bin/apxs --with-mysql=/usr/local/mysql --with-gd --enable-safe-mode --with-dom=/usr --enable-ftp --with-zlib --with-xsl --with-xmlrpc --enable-cli --enable-bcmath --with-iconv --with-jpeg-dir=/usr --with-png-dir=/usr --with-xpm-dir=/usr/X11R6



Reproduce code:
---------------
// all tests are viewed with browser encoding forced on UTF-8.
<?php
  print html_entity_decode("&euro;",ENT_NOQUOTES,"UTF-8");
  print "<br />";
  print htmlentities(html_entity_decode("&euro;",ENT_NOQUOTES,"UTF-8"),ENT_NOQUOTES,"UTF-8");
?>

// other test also available on 
// http://fire.desk.nl/decode/index.php
// (just enter &euro; and submit)
// and http://fire.desk.nl/decode/index.phps (source)

Expected result:
----------------
a euro-sign
a euro-sign


Actual result:
--------------
a fractional slash
a euro-sign


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2004-07-19 14:26 UTC] peter at desk dot nl
after looking at the sources, this patch seems to fix it :

fire:/home/peter/php-5.0.0/ext/standard# diff -u html.c.org html.c
--- html.c.org  2004-07-19 14:24:08.000000000 +0200
+++ html.c      2004-07-19 14:13:50.000000000 +0200
@@ -158,7 +158,7 @@
        "thinsp", NULL, NULL, "zwnj", "zwj", "lrm", "rlm",
        NULL, NULL, NULL, "ndash", "mdash", NULL, NULL, NULL,
        "lsquo", "rsquo", "sbquo", NULL, "ldquo", "rdquo", "bdquo",
-       "dagger", "Dagger",     "bull", NULL, NULL, NULL, "hellip",
+       NULL,"dagger", "Dagger",        "bull", NULL, NULL, NULL, "hellip",
        NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, "permil", NULL,
        "prime", "Prime", NULL, NULL, NULL, NULL, NULL, "lsaquo", "rsaquo",
        NULL, NULL, NULL, "oline", NULL, NULL, NULL, NULL, NULL,
 [2004-07-19 15:17 UTC] peter at desk dot nl
nb : the above-mentioned online tests at http://fire.desk.nl/ are now no longer relevant as that machine runs the patched version.
 [2004-07-19 19:54 UTC] moriyoshi@php.net
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.


 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Dec 03 17:01:29 2024 UTC