php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #60714 htmlspecialchars() ignore default_charset value
Submitted: 2012-01-11 15:38 UTC Modified: 2012-01-12 21:05 UTC
From: mahatma at bspu dot unibel dot by Assigned:
Status: Not a bug Package: *Languages/Translation
PHP Version: 5.4.0RC5 OS: linux
Private report: No CVE-ID: None
 [2012-01-11 15:38 UTC] mahatma at bspu dot unibel dot by
Description:
------------
Since default charset changed, I got compatibility problem - htmlspecialchars() start to strip cp1251 (or any non-unicode) national symbols and no way to change another default charset. But looks like default_charset is provided for similar goals (and, ideally - charset= html detection too).

I suggest just to get default charset for htmlspecialchars() (and IMHO for htmlentities()) from default_charset.


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2012-01-11 22:50 UTC] cataphract@php.net
Yes, there's a BC break in 5.4 in this respect. You can make htmlspecialchars use default_charset, but you'd still have to change all the calls to use the empty string as charset.

I'm leaving this open for now, as this change was made without any discussion I remember.
 [2012-01-12 08:47 UTC] rasmus@php.net
There was discussion. Simply changing the not-specified case to use the 
default_charset setting would break a lot of code. However, since it is a useful 
feature, it is supported through the empty string case as documented at 
http://php.net/htmlspecialchars
 [2012-01-12 11:10 UTC] cataphract@php.net
The problem is changing the default from ISO-8859-1 to UTF-8. The ISO-8859-1 default was good for CP 1251 because the encoded characters are represented the same way (using htmlentities, however, would be a problem). The break in 5.4 is that the default was changed from ISO-8859-1  to UTF-8, and CP 1251 byte streams, in general, are not valid UTF-8 bytestreams, even though the encoded characters are represented the same way. This causes htmlspecialchars to error out and return an empty string.
 [2012-01-12 17:43 UTC] rasmus@php.net
Yes, I understand why it breaks sites using cp1251. However, it was sort of dumb 
luck that they weren't broken before due to iso-8859-1 and cp1251 getting along. 
htmlspecialchars has direct support for cp1251, so if you are passing that in you 
should tell it, or set the default_charset and tell htmlspecialchars to use the 
default by passing in '' as the charset. On the whole I firmly believe we have 
made more sites secure than we have broken with this change.
 [2012-01-12 21:05 UTC] cataphract@php.net
-Status: Open +Status: Bogus
 [2012-01-12 21:05 UTC] cataphract@php.net
Closing.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 26 06:01:32 2024 UTC