php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #18521 htmlentities with charset iso-8859-7 anomalies
Submitted: 2002-07-23 23:29 UTC Modified: 2002-09-26 11:11 UTC
From: spud at nothingness dot org Assigned:
Status: Closed Package: Strings related
PHP Version: 4.2.2 OS: Linux (RedHat 7.2)
Private report: No CVE-ID:
 [2002-07-23 23:29 UTC] spud at nothingness dot org
PHP compiled with:
./configure --with-config-file-path=/usr/local --with-mysql --with-gd --with-gettext=/usr/bin --with-jpeg-dir --with-png-dir --with-tiff-dir --with-ttf --with--enable-bcmath --enable-inline-optimization --enable-sysvsem --enable-sysvshm --enable-trans-sid --enable-shared-pdflib --with-regex=system --with-zlib --with-curl=/usr/include --enable-sockets --with-apxs=/usr/sbin/apxs

I have been attempting to internationalize my PHP application (specifically to support Greek). Data is stored in MySQL, and displayed on the page using

nl2br(htmlentities($body))

For internationalization, I changed this to

nl2br(htmlentities($body,ENT_COMPAT,'ISO-8859-7'))

...but the Greek text appears on the page as accented Latin 1 characters, rather than Greek. (This line is the only "short script" necessary to reproduce the problem, as long as the $body variable contains Greek text).

If I use htmlspecialchars() instead, still indicating the charset, the text appears fine (in Greek). If I simply use 

nl2br($body)

it also appears in Greek (but won't, obviously, escape any special characters). So its appears to be directly related to htmlentities().

The web page itself contains a META header indicating
content="text/html; charset=ISO-8859-7"

...so the page should know what charset to expect, and shouldn't be part of the problem.

This bug was also present in PHP 4.2.1.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2002-07-24 19:26 UTC] sniper@php.net
Provide a short but complete example script for us to test with. (the sample text also!)

 [2002-07-24 21:29 UTC] spud at nothingness dot org

 [2002-09-26 11:11 UTC] wez@php.net
htmlentities does not support iso-8859-7.
As a workaround, use mb_convert_encoding to translate the
string to utf-8 and then apply htmlentities to the result.

htmlentities should emit a warning if you request an unsupported charset/encoding instead of silently falling back on latin-1.
Fixed in CVS.
 
PHP Copyright © 2001-2014 The PHP Group
All rights reserved.
Last updated: Sat Apr 19 01:01:59 2014 UTC