php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #49814 htmlentities/htmlspecialchars accept partial multibyte sequences still
Submitted: 2009-10-08 14:15 UTC Modified: 2009-10-13 15:13 UTC
Votes:8
Avg. Score:4.4 ± 0.9
Reproduced:3 of 5 (60.0%)
Same Version:2 (66.7%)
Same OS:2 (66.7%)
From: hello at iwamot dot com Assigned:
Status: Closed Package: Strings related
PHP Version: 5.3.2-dev OS: *
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: hello at iwamot dot com
New email:
PHP Version: OS:

 

 [2009-10-08 14:15 UTC] hello at iwamot dot com
Description:
------------
PHP 5 ChangelLog says "Fixed htmlentities/htmlspecialchars not to accept partial multibyte sequences."
http://www.php.net/ChangeLog-5.php#5.2.5

But it has not been fixed in reality. Please correct the log, or investigate my patch.
http://iwamot.com/misc/html.c.patch.20091008

Reproduce code:
---------------
// Shift_JIS
echo htmlspecialchars("\x80",  ENT_QUOTES, 'Shift_JIS') . "!\n";
echo htmlspecialchars("\x81/", ENT_QUOTES, 'Shift_JIS') . "!\n";
// EUC-JP
echo htmlspecialchars("\x80",  ENT_QUOTES, 'EUC-JP')    . "!\n";
echo htmlspecialchars("\xA1/", ENT_QUOTES, 'EUC-JP')    . "!\n";

Expected result:
----------------
returning empty string (as well as my patch):

!
!
!
!

or sanitizing:

!
/!
!
/!

Actual result:
--------------
_!
_/!
_!
_/!

("_" means an invalid byte)

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-10-09 11:50 UTC] mcdmaster at auone dot jp
Sorry but this issue is the same as bug #49785, isn't it?
 [2009-10-09 16:46 UTC] hello at iwamot dot com
Yes it is. Many thanks for your time and help!
 [2009-10-11 07:16 UTC] hello at iwamot dot com
First of all, thank you for your fixing bug #49785. But it seems to me that htmlentities/htmlspecialchars must not accept [\x80 - \x8d] when EUC-JP is specified. If I'm right, I hope they will be fixed. Or close this report please. Thanks.
 [2009-10-13 15:13 UTC] hello at iwamot dot com
I received a message from Moriyoshi. According to him, htmlentities/htmlspecialchars must accept [\x80 - \x8d], because they are not a lead byte. Then application developers may use those as some sort of control codes.

I agree with him, and close this report. Thank you all for your kindness.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 16:01:28 2024 UTC