php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #60714 htmlspecialchars() ignore default_charset value
Submitted: 2012-01-11 15:38 UTC Modified: 2012-01-12 21:05 UTC
From: mahatma at bspu dot unibel dot by Assigned:
Status: Not a bug Package: *Languages/Translation
PHP Version: 5.4.0RC5 OS: linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: mahatma at bspu dot unibel dot by
New email:
PHP Version: OS:

 

 [2012-01-11 15:38 UTC] mahatma at bspu dot unibel dot by
Description:
------------
Since default charset changed, I got compatibility problem - htmlspecialchars() start to strip cp1251 (or any non-unicode) national symbols and no way to change another default charset. But looks like default_charset is provided for similar goals (and, ideally - charset= html detection too).

I suggest just to get default charset for htmlspecialchars() (and IMHO for htmlentities()) from default_charset.


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2012-01-11 22:50 UTC] cataphract@php.net
Yes, there's a BC break in 5.4 in this respect. You can make htmlspecialchars use default_charset, but you'd still have to change all the calls to use the empty string as charset.

I'm leaving this open for now, as this change was made without any discussion I remember.
 [2012-01-12 08:47 UTC] rasmus@php.net
There was discussion. Simply changing the not-specified case to use the 
default_charset setting would break a lot of code. However, since it is a useful 
feature, it is supported through the empty string case as documented at 
http://php.net/htmlspecialchars
 [2012-01-12 11:10 UTC] cataphract@php.net
The problem is changing the default from ISO-8859-1 to UTF-8. The ISO-8859-1 default was good for CP 1251 because the encoded characters are represented the same way (using htmlentities, however, would be a problem). The break in 5.4 is that the default was changed from ISO-8859-1  to UTF-8, and CP 1251 byte streams, in general, are not valid UTF-8 bytestreams, even though the encoded characters are represented the same way. This causes htmlspecialchars to error out and return an empty string.
 [2012-01-12 17:43 UTC] rasmus@php.net
Yes, I understand why it breaks sites using cp1251. However, it was sort of dumb 
luck that they weren't broken before due to iso-8859-1 and cp1251 getting along. 
htmlspecialchars has direct support for cp1251, so if you are passing that in you 
should tell it, or set the default_charset and tell htmlspecialchars to use the 
default by passing in '' as the charset. On the whole I firmly believe we have 
made more sites secure than we have broken with this change.
 [2012-01-12 21:05 UTC] cataphract@php.net
-Status: Open +Status: Bogus
 [2012-01-12 21:05 UTC] cataphract@php.net
Closing.
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Sun Jul 06 22:01:34 2025 UTC