php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #61451 Document that the default character set change may break existing code
Submitted: 2012-03-20 11:18 UTC Modified: 2017-01-28 11:58 UTC
Votes:9
Avg. Score:4.8 ± 0.6
Reproduced:8 of 8 (100.0%)
Same Version:5 (62.5%)
Same OS:5 (62.5%)
From: perske at uni-muenster dot de Assigned:
Status: Verified Package: Documentation problem
PHP Version: 5.4.0 OS: n/a
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: perske at uni-muenster dot de
New email:
PHP Version: OS:

 

 [2012-03-20 11:18 UTC] perske at uni-muenster dot de
Description:
------------
---
From manual page: http://www.php.net/migration54.incompatible
---
The change of the default value of the $encoding parameter of html_entity_decode(), htmlentities() and htmlspecialchars() may break existing code. (It does for me!) Thus it should be mentioned on the "Backward Incompatible Changes" page. (There is a remark on the "Other changes" page, but that remark omits html_entity_decode().)

I propose adding this text as a list item:

"The default value of the $encoding parameter of html_entity_decode(), htmlspecialchars(), and htmlentities() has been changed from 'ISO-8859-1' to 'UTF-8'."




Test script:
---------------
n/a

Expected result:
----------------
n/a

Actual result:
--------------
n/a

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2014-07-23 21:15 UTC] ky dot patterson at adlinkr dot com
This remains a serious problem.
This change needs to be highlighted as an important BC break for those upgrading from (5 < 5.4) to 5.4 or 5.5 or 5.6.

I will explain why it is so severe, because most other descriptions I've seen of this problem somewhat miss the point:

Previously the default encoding was ISO-8859-1.
Now the default encoding is UTF-8.

There is no such thing as invalid ISO-8859 -- any byte stream looks like latin1.

There is very definitely such a thing as invalid UTF-8 -- for example, any ISO-8859 that is not pure 7-bit ASCII.

htmlspecialchars() and company will silently reject the input string if it is invalid for the given encoding.
They just return an empty string, they don't even issue a warning.

I didn't know this. No one who habitually uses ISO-8859 would know this, because it doesn't happen with ISO-8859.

So if you have a lot of code that looks like this:
<?php
echo htmlspecialchars($input);
?>
and $input is 8-bit ISO-8859, then you are going to have a problem when you upgrade to 5.4 or 5.5, you're going to have no output.

Worse, if $input is usually 7-bit ASCII and is only occasionally 8-bit ISO-8859-1 -- a common situation in North America -- then you are in for a subtle but very severe problem.
It won't be obvious during your pre-upgrade tests, and you will end up having to do a mass find & replace on your codebase in a hurry after the upgrade.

This isn't a hypothetical case. I just upgraded from 5.3 to 5.5 and got bit by this in production code.
I did my due diligence prior to upgrading: I thoroughly reviewed all of the upgrading notes for 5.4 and 5.5, and even for 5.6 which was still in alpha.
I even reviewed the changelog -- which by the way makes no mention whatsoever of this change.
Even a single hint about this change would have saved me.


My suggestion, for the many others who will have to upgrade from 5.2 or 5.3 at some point, is:

1) Add a note to the Backwards Incompatible Changes page for 5.4
http://ca2.php.net/manual/en/migration54.incompatible.php
Something like this:

"The default encoding used by htmlentities() and htmlspecialchars() has changed from ISO-8859-1 to UTF-8. This means that calling htmlspecialchars($string) will return nothing if $string is not valid UTF-8. If your input is not UTF-8, change your code to explicitly specify an encoding, e.g. htmlspecialchars($string, NULL, 'ISO-8859-1')"

2) Add a similar note to the Changed Functions page for 5.4
http://ca2.php.net/manual/en/migration54.parameters.php

3) Add a note to the Changelog page
http://php.net/ChangeLog-5.php#5.4.0 (or whichever version the change actually happened at)

4) Add a note to the Changed Functions page for 5.6
http://ca2.php.net/manual/en/migration56.changed-functions.php
In 5.6 htmlentities and htmlspecialchars have been changed again, to honour the default_encoding INI directive.
There should be a note about this, to reflect the fact that 5.6 can be made to work like 5.3, whereas 5.4 and 5.5 cannot be.
 [2015-01-15 21:12 UTC] djonline at djonline dot ru
Also fix this bug by using default charset from setlocale, like any other function.
Today setlocale not change default charset to this function, so this is a big incompatible with valid existing 5.3 code.
 [2017-01-28 11:58 UTC] cmb@php.net
-Status: Open +Status: Verified
 
PHP Copyright © 2001-2018 The PHP Group
All rights reserved.
Last updated: Sat Oct 20 01:01:25 2018 UTC