|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #49814 htmlentities/htmlspecialchars accept partial multibyte sequences still
Submitted: 2009-10-08 14:15 UTC Modified: 2009-10-13 15:13 UTC
Avg. Score:4.4 ± 0.9
Reproduced:3 of 5 (60.0%)
Same Version:2 (66.7%)
Same OS:2 (66.7%)
From: hello at iwamot dot com Assigned:
Status: Closed Package: Strings related
PHP Version: 5.3.2-dev OS: *
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Bug Type:
From: hello at iwamot dot com
New email:
PHP Version: OS:


 [2009-10-08 14:15 UTC] hello at iwamot dot com
PHP 5 ChangelLog says "Fixed htmlentities/htmlspecialchars not to accept partial multibyte sequences."

But it has not been fixed in reality. Please correct the log, or investigate my patch.

Reproduce code:
// Shift_JIS
echo htmlspecialchars("\x80",  ENT_QUOTES, 'Shift_JIS') . "!\n";
echo htmlspecialchars("\x81/", ENT_QUOTES, 'Shift_JIS') . "!\n";
echo htmlspecialchars("\x80",  ENT_QUOTES, 'EUC-JP')    . "!\n";
echo htmlspecialchars("\xA1/", ENT_QUOTES, 'EUC-JP')    . "!\n";

Expected result:
returning empty string (as well as my patch):


or sanitizing:


Actual result:

("_" means an invalid byte)


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2009-10-09 11:50 UTC] mcdmaster at auone dot jp
Sorry but this issue is the same as bug #49785, isn't it?
 [2009-10-09 16:46 UTC] hello at iwamot dot com
Yes it is. Many thanks for your time and help!
 [2009-10-11 07:16 UTC] hello at iwamot dot com
First of all, thank you for your fixing bug #49785. But it seems to me that htmlentities/htmlspecialchars must not accept [\x80 - \x8d] when EUC-JP is specified. If I'm right, I hope they will be fixed. Or close this report please. Thanks.
 [2009-10-13 15:13 UTC] hello at iwamot dot com
I received a message from Moriyoshi. According to him, htmlentities/htmlspecialchars must accept [\x80 - \x8d], because they are not a lead byte. Then application developers may use those as some sort of control codes.

I agree with him, and close this report. Thank you all for your kindness.
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed Jun 19 01:01:29 2024 UTC