php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #30020 htmlentities error
Submitted: 2004-09-08 04:43 UTC Modified: 2004-09-08 09:17 UTC
From: xing at mac dot com Assigned:
Status: Not a bug Package: Strings related
PHP Version: 4.3.9RC2 OS: Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: xing at mac dot com
New email:
PHP Version: OS:

 

 [2004-09-08 04:43 UTC] xing at mac dot com
Description:
------------
htmlentities function appears to eat "?<" but correctly handle "? <".

Reproduce code:
---------------
$str = "Le Caf?< Caf? <"; 

$str = html_entity_decode((htmlentities($str,ENT_QUOTES,"UTF-8"));

Expected result:
----------------
"Le Caf?< Caf? <"

Actual result:
--------------
"Le Caf< Caf? <"

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2004-09-08 07:46 UTC] derick@php.net
Can you e-mail me the scripts (in a zip file or something)? I am afraid the bugsystem doesn't handle UTF8 well.
 [2004-09-08 08:29 UTC] xing at mac dot com
There should be no hidden UTF-8 characters in the string: all ascii compliant including the french accented e.

I did however have an error in the code where extra "(" was added. Here is the complete and correct code you can just copy-paste from any browser.

------
<?
$str = "Le Caf?< Caf? <"; 

$str = html_entity_decode(htmlentities($str,ENT_QUOTES,"UTF-8"));
echo $str;
?>
 [2004-09-08 08:35 UTC] derick@php.net
ascii is 7 bit, and accented characters are not in the 7 bit part and thus could I see them with a different charset, please send me the script.
 [2004-09-08 08:51 UTC] xing at mac dot com
gzipped code sent to derick@php.net.
 [2004-09-08 09:01 UTC] derick@php.net
This is not a bug at all, you're doing something wrong here ;-)

THe first part of your code is:
<?
$str = "Le Café< Café <";

$str = htmlentities($str,ENT_QUOTES,"UTF-8");

This will try to interpret the $str as an UTF-8 string and add htmlentities for all utf-8 chars that it finds in here.
Because é< is not a valid combination as a multi-byte UTF8 character, the é is simply skipped.

I don't know why the second é isn't stripped out though. 
Anyway, your code is wrong, don't try to use an iso-8859-1 string as UTF8.
 [2004-09-08 09:03 UTC] derick@php.net
replace the é with ? in my explanation, by browser used UTF8 to post the message ;-)
 [2004-09-08 09:17 UTC] xing at mac dot com
Thanks a bunch for the clarification. =)
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Fri May 09 14:01:27 2025 UTC