php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #29609 htmlentities() failure
Submitted: 2004-08-11 11:40 UTC Modified: 2004-08-11 14:38 UTC
From: jon at hiveminds dot net Assigned:
Status: Not a bug Package: Strings related
PHP Version: 5.0.0 OS: Windows 2000 / SP4
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: jon at hiveminds dot net
New email:
PHP Version: OS:

 

 [2004-08-11 11:40 UTC] jon at hiveminds dot net
Description:
------------
This string function fails on the "?" character (and other Latin-2 characters).

Test:

echo htmlentities("Ket?ua");

Returns: Ket?ua

Reproduce code:
---------------
Affected code:

$value = htmlentities($value);

preg_match_all("/&([^;]*);/", $value, $matches);
$parts = preg_split("/&|;/", $value, -1, PREG_SPLIT_NO_EMPTY);

foreach($parts as $part)
 $td->appendChild(
                    in_array($part, $matches[1])
                    ? $doc->createEntityReference($part)
                    : $doc->createTextNode($part)
                  );

Point of failure: the string "Ket?ua"

Expected result:
----------------
Character should be converted to Š (upper) / š (lower) per HTML spec (see special-1.ent listing).

Failing this, should it not be possible to create a TextNode containing this character when specifying a Latin-2 or Unicode charset?

Actual result:
--------------
Function returns the literal "?" character with no conversion.

It is not possible to create a TextNode containing this character using DOMDocument::createTextNode().

I have written a workaround which replaces the Lat-2 chars with their entity equivalents using str_replace(), but even so PHP still issues several warnings when sending output to the browser using DOMDocument::saveHTML():

Warning: output conversion failed due to conv error in O:\webs\mysqli\oop-multi-select-with-dom.php on line 172

Warning: Bytes: 0x92 0x49 0x76 0x6F in O:\webs\mysqli\oop-multi-select-with-dom.php on line 172

Warning: xmlOutputBufferWrite: encoder error in O:\webs\mysqli\oop-multi-select-with-dom.php on line 172

The correct entity (š) does get sent, but I have to suppress the error with an "@", which I don't like doing.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2004-08-11 13:26 UTC] derick@php.net
This charset is not supported, as you can see here:
http://php.net/htmlentities
 [2004-08-11 14:16 UTC] jon at hiveminds dot net
This is a Unicode character as well as being present in ISO-8859-15 (aka Latin-9 -- see http://www.columbia.edu/kermit/latin9.html), and the PHP Manual says that UTF-8 and ISO-8859-15 are both supposed to be supported by this function. 

However, even when UTF-8 or ISO-8859-15 is specified, this function still fails to make the expected conversion, and createEntityReference() still returns multiple warnings when passed the corresponding entity name even though it shouldn't.
 [2004-08-11 14:38 UTC] derick@php.net
Of course if you pass it as an iso-8859-2 character...

In -2 it's at a different position than in -5

See: http://www.eki.ee/letter/chardata.cgi?cp=8859-2&cp1=8859-15

(It's position A9 in -2 and A6 in -15)
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Dec 22 11:01:30 2024 UTC