php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #18521 htmlentities with charset iso-8859-7 anomalies
Submitted: 2002-07-23 23:29 UTC Modified: 2002-09-26 11:11 UTC
From: spud at nothingness dot org Assigned:
Status: Closed Package: Strings related
PHP Version: 4.2.2 OS: Linux (RedHat 7.2)
Private report: No CVE-ID: None
 [2002-07-23 23:29 UTC] spud at nothingness dot org
PHP compiled with:
./configure --with-config-file-path=/usr/local --with-mysql --with-gd --with-gettext=/usr/bin --with-jpeg-dir --with-png-dir --with-tiff-dir --with-ttf --with--enable-bcmath --enable-inline-optimization --enable-sysvsem --enable-sysvshm --enable-trans-sid --enable-shared-pdflib --with-regex=system --with-zlib --with-curl=/usr/include --enable-sockets --with-apxs=/usr/sbin/apxs

I have been attempting to internationalize my PHP application (specifically to support Greek). Data is stored in MySQL, and displayed on the page using

nl2br(htmlentities($body))

For internationalization, I changed this to

nl2br(htmlentities($body,ENT_COMPAT,'ISO-8859-7'))

...but the Greek text appears on the page as accented Latin 1 characters, rather than Greek. (This line is the only "short script" necessary to reproduce the problem, as long as the $body variable contains Greek text).

If I use htmlspecialchars() instead, still indicating the charset, the text appears fine (in Greek). If I simply use 

nl2br($body)

it also appears in Greek (but won't, obviously, escape any special characters). So its appears to be directly related to htmlentities().

The web page itself contains a META header indicating
content="text/html; charset=ISO-8859-7"

...so the page should know what charset to expect, and shouldn't be part of the problem.

This bug was also present in PHP 4.2.1.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2002-07-24 19:26 UTC] sniper@php.net
Provide a short but complete example script for us to test with. (the sample text also!)

 [2002-07-24 21:29 UTC] spud at nothingness dot org
Attempting to insert a sample script here, but since the problem results from Greek text input, I don't know whether or not the Greek I paste in here will be submitted intact. If not, the script works as a functioning submission form if you can input Greek characters. See http://alt.baltimoreimc.org/test.php for an example.

<?
$body = "??????????? ???????? ???????? ?????????? ????????????? ????? ??? ???????, ???? ??? ?????????? ?? ??????????? ?? ?? ????-??????? ???????, ???????? ??????? ?????, ??? ?? ?????? ?????? ??? ??? ??????????? ???? ???? ???? ?? ??????.? ?? ??????, ???????????? ?????????? ?????????? ??? ??????????? ???????, ??????????? ?? ?????????? ???? ??????????? ????? ?????????? ????????? ??? ?????????????.";
if ($body) {
	echo ('<hr>RAW (should be clean)');
	echo ('<hr>');
	echo ($body);
	echo ('<hr>HTMLENTITIES (should appear as accented Latin 1)');
	echo ('<hr>');
	echo (htmlentities($body,ENT_COMPAT,'ISO-8859-7'));
	echo ('<hr>HTMLSPECIALCHARS (should be clean)');
	echo ('<hr>');
	echo (htmlspecialchars($body,ENT_COMPAT,'ISO-8859-7'));
	echo ('<hr>');
} else {
?>
<form action="<? echo $PHP_SELF; ?>" method="post" accept-charset="iso-8859-1,iso-8859-7">
	<textarea name="body" rows="4" cols="50"></textarea>
	<br>
	<input type="submit" name="submit" value="submit">
</form>
<?
}
?>
 [2002-09-26 11:11 UTC] wez@php.net
htmlentities does not support iso-8859-7.
As a workaround, use mb_convert_encoding to translate the
string to utf-8 and then apply htmlentities to the result.

htmlentities should emit a warning if you request an unsupported charset/encoding instead of silently falling back on latin-1.
Fixed in CVS.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 20:01:29 2024 UTC