php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #18521 htmlentities with charset iso-8859-7 anomalies
Submitted: 2002-07-23 23:29 UTC Modified: 2002-09-26 11:11 UTC
From: spud at nothingness dot org Assigned:
Status: Closed Package: Strings related
PHP Version: 4.2.2 OS: Linux (RedHat 7.2)
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: spud at nothingness dot org
New email:
PHP Version: OS:

 

 [2002-07-23 23:29 UTC] spud at nothingness dot org
PHP compiled with:
./configure --with-config-file-path=/usr/local --with-mysql --with-gd --with-gettext=/usr/bin --with-jpeg-dir --with-png-dir --with-tiff-dir --with-ttf --with--enable-bcmath --enable-inline-optimization --enable-sysvsem --enable-sysvshm --enable-trans-sid --enable-shared-pdflib --with-regex=system --with-zlib --with-curl=/usr/include --enable-sockets --with-apxs=/usr/sbin/apxs

I have been attempting to internationalize my PHP application (specifically to support Greek). Data is stored in MySQL, and displayed on the page using

nl2br(htmlentities($body))

For internationalization, I changed this to

nl2br(htmlentities($body,ENT_COMPAT,'ISO-8859-7'))

...but the Greek text appears on the page as accented Latin 1 characters, rather than Greek. (This line is the only "short script" necessary to reproduce the problem, as long as the $body variable contains Greek text).

If I use htmlspecialchars() instead, still indicating the charset, the text appears fine (in Greek). If I simply use 

nl2br($body)

it also appears in Greek (but won't, obviously, escape any special characters). So its appears to be directly related to htmlentities().

The web page itself contains a META header indicating
content="text/html; charset=ISO-8859-7"

...so the page should know what charset to expect, and shouldn't be part of the problem.

This bug was also present in PHP 4.2.1.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2002-07-24 19:26 UTC] sniper@php.net
Provide a short but complete example script for us to test with. (the sample text also!)

 [2002-07-24 21:29 UTC] spud at nothingness dot org
Attempting to insert a sample script here, but since the problem results from Greek text input, I don't know whether or not the Greek I paste in here will be submitted intact. If not, the script works as a functioning submission form if you can input Greek characters. See http://alt.baltimoreimc.org/test.php for an example.

<?
$body = "??????????? ???????? ???????? ?????????? ????????????? ????? ??? ???????, ???? ??? ?????????? ?? ??????????? ?? ?? ????-??????? ???????, ???????? ??????? ?????, ??? ?? ?????? ?????? ??? ??? ??????????? ???? ???? ???? ?? ??????.? ?? ??????, ???????????? ?????????? ?????????? ??? ??????????? ???????, ??????????? ?? ?????????? ???? ??????????? ????? ?????????? ????????? ??? ?????????????.";
if ($body) {
	echo ('<hr>RAW (should be clean)');
	echo ('<hr>');
	echo ($body);
	echo ('<hr>HTMLENTITIES (should appear as accented Latin 1)');
	echo ('<hr>');
	echo (htmlentities($body,ENT_COMPAT,'ISO-8859-7'));
	echo ('<hr>HTMLSPECIALCHARS (should be clean)');
	echo ('<hr>');
	echo (htmlspecialchars($body,ENT_COMPAT,'ISO-8859-7'));
	echo ('<hr>');
} else {
?>
<form action="<? echo $PHP_SELF; ?>" method="post" accept-charset="iso-8859-1,iso-8859-7">
	<textarea name="body" rows="4" cols="50"></textarea>
	<br>
	<input type="submit" name="submit" value="submit">
</form>
<?
}
?>
 [2002-09-26 11:11 UTC] wez@php.net
htmlentities does not support iso-8859-7.
As a workaround, use mb_convert_encoding to translate the
string to utf-8 and then apply htmlentities to the result.

htmlentities should emit a warning if you request an unsupported charset/encoding instead of silently falling back on latin-1.
Fixed in CVS.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 20:01:29 2024 UTC