php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #30020 htmlentities error
Submitted: 2004-09-08 04:43 UTC Modified: 2004-09-08 09:17 UTC
From: xing at mac dot com Assigned:
Status: Not a bug Package: Strings related
PHP Version: 4.3.9RC2 OS: Linux
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: xing at mac dot com
New email:
PHP Version: OS:

 

 [2004-09-08 04:43 UTC] xing at mac dot com
Description:
------------
htmlentities function appears to eat "?<" but correctly handle "? <".

Reproduce code:
---------------
$str = "Le Caf?< Caf? <"; 

$str = html_entity_decode((htmlentities($str,ENT_QUOTES,"UTF-8"));

Expected result:
----------------
"Le Caf?< Caf? <"

Actual result:
--------------
"Le Caf< Caf? <"

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2004-09-08 07:46 UTC] derick@php.net
Can you e-mail me the scripts (in a zip file or something)? I am afraid the bugsystem doesn't handle UTF8 well.
 [2004-09-08 08:29 UTC] xing at mac dot com
There should be no hidden UTF-8 characters in the string: all ascii compliant including the french accented e.

I did however have an error in the code where extra "(" was added. Here is the complete and correct code you can just copy-paste from any browser.

------
<?
$str = "Le Caf?< Caf? <"; 

$str = html_entity_decode(htmlentities($str,ENT_QUOTES,"UTF-8"));
echo $str;
?>
 [2004-09-08 08:35 UTC] derick@php.net
ascii is 7 bit, and accented characters are not in the 7 bit part and thus could I see them with a different charset, please send me the script.
 [2004-09-08 08:51 UTC] xing at mac dot com
gzipped code sent to derick@php.net.
 [2004-09-08 09:01 UTC] derick@php.net
This is not a bug at all, you're doing something wrong here ;-)

THe first part of your code is:
<?
$str = "Le Café< Café <";

$str = htmlentities($str,ENT_QUOTES,"UTF-8");

This will try to interpret the $str as an UTF-8 string and add htmlentities for all utf-8 chars that it finds in here.
Because é< is not a valid combination as a multi-byte UTF8 character, the é is simply skipped.

I don't know why the second é isn't stripped out though. 
Anyway, your code is wrong, don't try to use an iso-8859-1 string as UTF8.
 [2004-09-08 09:03 UTC] derick@php.net
replace the é with ? in my explanation, by browser used UTF8 to post the message ;-)
 [2004-09-08 09:17 UTC] xing at mac dot com
Thanks a bunch for the clarification. =)
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu May 02 22:01:30 2024 UTC