|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #27505 htmlentities fail to escape BIG5 characters correctly
Submitted: 2004-03-05 03:43 UTC Modified: 2004-03-06 13:27 UTC
From: ywliu at hotmail dot com Assigned:
Status: Closed Package: *Languages/Translation
PHP Version: 4.3.4 OS: linux
Private report: No CVE-ID: None
 [2004-03-05 03:43 UTC] ywliu at hotmail dot com
In ext/standard/html.c , htmlentities() fails to identify BIG5 Chinese characters correctly.

I have checked CVS version 1.87, the bug is still there.

Reproduce code:
In html.c, look for this piece of code :

case cs_big5:
case cs_gb2312:
case cs_big5hkscs:
	/* check if this is the first of a 2-byte sequence */
	if (this_char >= 0xa1 && this_char <= 0xf9) {
	/* peek at the next char */
	unsigned char next_char = str[pos];
		if ((next_char >= 0x40 && next_char <= 0x73) ||(next_char >= 0xa1 && next_char <= 0xfe)) {

Expected result:
In fact, the first byte should be from 0xa1 to 0xfe, and the second byte should be from 0x40-0x7e and 0xa1-0xfe.

(from page 88, "Understanding Japanese Information Processing" by Ken Lunde , O'Reilly.)

Actual result:
So it should be :

	if (this_char >= 0xa1 && this_char <= 0xfe) {


		if ((next_char >= 0x40 && next_char <= 0x7e) ||(next_char >= 0xa1 && next_char <= 0xfe)) {


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2004-03-06 13:27 UTC]
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
Thank you for the report, and for helping us make PHP better.

PHP Copyright © 2001-2023 The PHP Group
All rights reserved.
Last updated: Sat Dec 02 00:01:27 2023 UTC