php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #47798 html_entity_decode() not covering Z/z with caron for Windows-1252
Submitted: 2009-03-27 01:05 UTC Modified: 2010-10-11 02:33 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: f4ckm5 at web dot de Assigned: cataphract (profile)
Status: Closed Package: Strings related
PHP Version: 5.2.9 OS: Windows
Private report: No CVE-ID: None
 [2009-03-27 01:05 UTC] f4ckm5 at web dot de
Description:
------------
html_entity_decode should decode "LATIN CAPITAL LETTER Z WITH CARON" Ž Ž Ž to (int)142, hex(8E) for Windows-1252
html_entity_decode should decode "LATIN SMALL LETTER Z WITH CARON" ž ž ž to (int)158, hex(9E) for Windows-1252

htmlentities should encode the respective characters to Ž and ž (ž and Ž is not well supported by most browsers)

Reproduce code:
---------------
var_dump(html_entity_decode("Ž", ENT_QUOTES, "Windows-1252"));
var_dump(html_entity_decode("Ž", ENT_QUOTES, "Windows-1252"));
var_dump(html_entity_decode("Ž", ENT_QUOTES, "Windows-1252"));
var_dump(html_entity_decode("ž", ENT_QUOTES, "Windows-1252"));
var_dump(html_entity_decode("ž", ENT_QUOTES, "Windows-1252"));
var_dump(html_entity_decode("ž", ENT_QUOTES, "Windows-1252"));
var_dump(htmlentities(chr(142), ENT_QUOTES, "Windows-1252", true));
var_dump(htmlentities(chr(158), ENT_QUOTES, "Windows-1252", true));

Expected result:
----------------
string(1) "?"
string(1) "?"
string(1) "?"
string(1) "?"
string(1) "?"
string(1) "?"
string(6) "Ž"
string(6) "ž"

Actual result:
--------------
string(8) "Ž"
string(6) "Ž"
string(7) "Ž"
string(8) "ž"
string(6) "ž"
string(7) "ž"
string(1) "?"
string(1) "?"


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2010-10-11 02:33 UTC] cataphract@php.net
-Status: Open +Status: Closed -Package: Feature/Change Request +Package: *General Issues -Assigned To: +Assigned To: cataphract
 [2010-10-11 02:33 UTC] cataphract@php.net
This has been partially fixed in trunk. The result is:

string(8) "Ž"
string(1) "�"
string(1) "�"
string(8) "ž"
string(1) "�"
string(1) "�"
string(1) "�"
string(1) "�" 

ž and Ž are not supported because I can't find those in the entities for HTML or XHTML:

See http://www.w3.org/TR/2002/REC-xhtml1-20020801/dtds.html#h-A2 (XHTML)

HTML 4.01:
http://www.w3.org/TR/html4/HTMLlat1.ent
http://www.w3.org/TR/html4/HTMLsymbol.ent
http://www.w3.org/TR/html4/HTMLspecial.ent
 [2010-10-11 02:33 UTC] cataphract@php.net
-Package: *General Issues +Package: Strings related
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Apr 16 23:01:30 2024 UTC