php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #66264 htmlspecialchars_decode() does not take an encoding argument.
Submitted: 2013-12-11 14:41 UTC Modified: 2018-05-06 22:52 UTC
From: lyle dot mantooth at primasupply dot com Assigned:
Status: Open Package: Strings related
PHP Version: 5.4.22 OS: Linux Mint
Private report: No CVE-ID: None
 [2013-12-11 14:41 UTC] lyle dot mantooth at primasupply dot com
Description:
------------
---
From manual page: http://www.php.net/function.htmlspecialchars-decode
---
Of all the functions that handle converting HTML entities to and from their character equivalents, htmlspecialchars_decode() is the only one that doesn't seem to care about the string encoding.

* htmlspecialchars(), $encoding parameter added in PHP 4.1.0.
* htmlentities(), $encoding parameter added in PHP 4.1.0.
* html_entity_decode(), function added in PHP 4.3.0, presumably with $encoding parameter from the beginning.
* hmtlspecialchars_decode(), function added in PHP 5.1.0, but no $encoding parameter.

This may be only a documentation problem, but if it isn't, then it's a real problem working with this API without even knowing what the default encoding is. Did it change in PHP 5.4.0 for this function, like it did for the others? I still don't know.


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2013-12-11 18:10 UTC] aharvey@php.net
-Type: Documentation Problem +Type: Feature/Change Request -Package: Documentation problem +Package: Strings related
 [2013-12-11 18:10 UTC] aharvey@php.net
Transmogrifying into a feature request: htmlspecialchars_decode() doesn't currently support a character set parameter (although the internal php_unescape_html_entities() function it calls does).
 [2018-05-06 20:46 UTC] cmb@php.net
Since all supported character encodings[1] are ASCII compatible
(i.e. they are supersets of ASCII), and all "htmlspecialchars" are
ASCII characters, the actual encoding doesn't really matter for
htmlspecialchars_decode().  So adding an $encoding parameter would
be useless, maybe except for checking whether the given string is
valid for the given encoding.

[1] <https://github.com/php/php-src/blob/PHP-7.2.5/ext/standard/html_tables.h#L44-L78>
 [2018-05-06 22:52 UTC] yohgaki@php.net
The best practice is normalize and validate encoding at application input processing code. i.e. Normalize and validate encoding as fast as it can. With MVC model, controllers should take care encoding normalization and validation.

Encoding validation by this function is still useful as multilayer protection, but it is rather optional.

Instead of promoting bad practices (too late encoding validations), it may be better to promote encoding validation with Fail Fast principle.
 [2018-05-07 03:31 UTC] spam2 at rhsoft dot net
@yohgaki@php.net: you are compleltly off-topic

with PHP 5.4 the default was changed away from latin in PHP
with PHP 5.6 that terrible BC was mitigated with default_charset (way too late)

the workaround for existing applications in 5.4 was to spit 'ISO8859-1' as $encoding param for htmlentities(), html_entity_decode() and friends all around the codebase and the funtion in question is lacking that param
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed Oct 16 00:01:27 2024 UTC