php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #49814 htmlentities/htmlspecialchars accept partial multibyte sequences still
Submitted: 2009-10-08 14:15 UTC Modified: 2009-10-13 15:13 UTC
Votes:8
Avg. Score:4.4 ± 0.9
Reproduced:3 of 5 (60.0%)
Same Version:2 (66.7%)
Same OS:2 (66.7%)
From: hello at iwamot dot com Assigned:
Status: Closed Package: Strings related
PHP Version: 5.3.2-dev OS: *
Private report: No CVE-ID: None
 [2009-10-08 14:15 UTC] hello at iwamot dot com
Description:
------------
PHP 5 ChangelLog says "Fixed htmlentities/htmlspecialchars not to accept partial multibyte sequences."
http://www.php.net/ChangeLog-5.php#5.2.5

But it has not been fixed in reality. Please correct the log, or investigate my patch.
http://iwamot.com/misc/html.c.patch.20091008

Reproduce code:
---------------
// Shift_JIS
echo htmlspecialchars("\x80",  ENT_QUOTES, 'Shift_JIS') . "!\n";
echo htmlspecialchars("\x81/", ENT_QUOTES, 'Shift_JIS') . "!\n";
// EUC-JP
echo htmlspecialchars("\x80",  ENT_QUOTES, 'EUC-JP')    . "!\n";
echo htmlspecialchars("\xA1/", ENT_QUOTES, 'EUC-JP')    . "!\n";

Expected result:
----------------
returning empty string (as well as my patch):

!
!
!
!

or sanitizing:

!
/!
!
/!

Actual result:
--------------
_!
_/!
_!
_/!

("_" means an invalid byte)

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-10-09 11:50 UTC] mcdmaster at auone dot jp
Sorry but this issue is the same as bug #49785, isn't it?
 [2009-10-09 16:46 UTC] hello at iwamot dot com
Yes it is. Many thanks for your time and help!
 [2009-10-11 07:16 UTC] hello at iwamot dot com
First of all, thank you for your fixing bug #49785. But it seems to me that htmlentities/htmlspecialchars must not accept [\x80 - \x8d] when EUC-JP is specified. If I'm right, I hope they will be fixed. Or close this report please. Thanks.
 [2009-10-13 15:13 UTC] hello at iwamot dot com
I received a message from Moriyoshi. According to him, htmlentities/htmlspecialchars must accept [\x80 - \x8d], because they are not a lead byte. Then application developers may use those as some sort of control codes.

I agree with him, and close this report. Thank you all for your kindness.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Nov 09 12:01:29 2024 UTC