go to bug id or search bugs for
From PHP Manual: ... htmlentities... At present, the ISO-8859-1 character set is used.
I feel there is strong need to make htmlentities to support other charsets than ISO 8859-1. We, here in Poland for instance, use ISO 8859-2. Many Western/USA freeware programs use (as they of course should) htmlentities. Proper display of our diacritic chars is then impossible. We have to modify sources, removing htmlentities or defining own functions. It's a waste of time and resources. May I ask PHP developers to consider this issue?
Add a Patch
Add a Pull Request
BTW: you can use recode("ISO8859-2..h4",$text) for this
purpose. See GNU recode extension and recode docs for more
for most if the iso 8859-2 charset, there are no standard html entities, and the current behavior of htmlentities() is to leave such characters unconverted.
the real problem is that the current behavior of htmlentities() when passed an unknown charset is to use the iso 8859-1 mapping. it shouldn't try to use a charset mapping in that case.
Sorry, I am not PHP programmer... I just found that Polish characters in form entries (in not my software) are converted to html entities (and displayed incorrectly); I tracked thing to happen in htmlentities() function.
I am using Apache/2.0.48 (Unix) mod_ssl/2.0.48 OpenSSL/0.9.6b PHP/4.3.5-dev, it has been a few years since 2000; maybe the programmer should use another thing for checking the input... ;)
Thank you for your bug report. This issue has already been fixed
in the latest released version of PHP, which you can download at
htmlentities supports this via the third optional charset argument since 4.1.0