php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #29609 htmlentities() failure
Submitted: 2004-08-11 11:40 UTC Modified: 2004-08-11 14:38 UTC
From: jon at hiveminds dot net Assigned:
Status: Not a bug Package: Strings related
PHP Version: 5.0.0 OS: Windows 2000 / SP4
Private report: No CVE-ID: None
 [2004-08-11 11:40 UTC] jon at hiveminds dot net
Description:
------------
This string function fails on the "?" character (and other Latin-2 characters).

Test:

echo htmlentities("Ket?ua");

Returns: Ket?ua

Reproduce code:
---------------
Affected code:

$value = htmlentities($value);

preg_match_all("/&([^;]*);/", $value, $matches);
$parts = preg_split("/&|;/", $value, -1, PREG_SPLIT_NO_EMPTY);

foreach($parts as $part)
 $td->appendChild(
                    in_array($part, $matches[1])
                    ? $doc->createEntityReference($part)
                    : $doc->createTextNode($part)
                  );

Point of failure: the string "Ket?ua"

Expected result:
----------------
Character should be converted to Š (upper) / š (lower) per HTML spec (see special-1.ent listing).

Failing this, should it not be possible to create a TextNode containing this character when specifying a Latin-2 or Unicode charset?

Actual result:
--------------
Function returns the literal "?" character with no conversion.

It is not possible to create a TextNode containing this character using DOMDocument::createTextNode().

I have written a workaround which replaces the Lat-2 chars with their entity equivalents using str_replace(), but even so PHP still issues several warnings when sending output to the browser using DOMDocument::saveHTML():

Warning: output conversion failed due to conv error in O:\webs\mysqli\oop-multi-select-with-dom.php on line 172

Warning: Bytes: 0x92 0x49 0x76 0x6F in O:\webs\mysqli\oop-multi-select-with-dom.php on line 172

Warning: xmlOutputBufferWrite: encoder error in O:\webs\mysqli\oop-multi-select-with-dom.php on line 172

The correct entity (š) does get sent, but I have to suppress the error with an "@", which I don't like doing.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2004-08-11 13:26 UTC] derick@php.net
This charset is not supported, as you can see here:
http://php.net/htmlentities
 [2004-08-11 14:16 UTC] jon at hiveminds dot net
This is a Unicode character as well as being present in ISO-8859-15 (aka Latin-9 -- see http://www.columbia.edu/kermit/latin9.html), and the PHP Manual says that UTF-8 and ISO-8859-15 are both supposed to be supported by this function. 

However, even when UTF-8 or ISO-8859-15 is specified, this function still fails to make the expected conversion, and createEntityReference() still returns multiple warnings when passed the corresponding entity name even though it shouldn't.
 [2004-08-11 14:38 UTC] derick@php.net
Of course if you pass it as an iso-8859-2 character...

In -2 it's at a different position than in -5

See: http://www.eki.ee/letter/chardata.cgi?cp=8859-2&cp1=8859-15

(It's position A9 in -2 and A6 in -15)
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Apr 23 09:01:27 2024 UTC