|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #69450 Default to ENT_SUBSTITUTE for htmlspecialchars()
Submitted: 2015-04-14 17:50 UTC Modified: 2015-04-15 20:29 UTC
From: olafvdspek at gmail dot com Assigned:
Status: Wont fix Package: Output Control
PHP Version: Irrelevant OS:
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
Block user comment
Status: Assign to:
Bug Type:
From: olafvdspek at gmail dot com
New email:
PHP Version: OS:


 [2015-04-14 17:50 UTC] olafvdspek at gmail dot com
Could you default to ENT_SUBSTITUTE for htmlspecialchars()?

Returning an empty string seems sub-optimal.


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2015-04-14 19:52 UTC]
-Status: Open +Status: Wont fix
 [2015-04-14 19:52 UTC]
Replacing any invalid byte sequence in string has risk of broken output. i.e. Broken html structures, etc.

If you would not want to have empty outputs from htmlspecialchars, validate all of your inputs. You can replace invalid multibyte sequence optionally. i.e. Use mb_convert_encoding().

This is the optimal way to handle text.
 [2015-04-14 20:13 UTC] olafvdspek at gmail dot com
If that's the case wouldn't it be even better to halt the script?

Returning empty strings ALSO has the risk of broken output. Even worse, broken output is basically guaranteed.
 [2015-04-14 20:28 UTC]
I also think that ENT_SUBSTITUTE is a more reasonable default behavior. Could you elaborate in which circumstances it will cause broken output?
 [2015-04-15 05:45 UTC]
Generally speaking, escape functions cannot determine if a invalid byte in multibyte stream is legitimate or not.

Besides, invalid multibyte stream should be detected while input data handling. This is the best practice. Otherwise, apps may be affected by char encoding based attack any places in the software including low level lib's vulnerability. 

BTW, current chrome (since about 2 years ago) won't output any if input is broken badly because it's impossible to make sure security. 

Returning blank string perfectly makes sense for security stand point.
 [2015-04-15 05:49 UTC]
More detailed example.

<multibyte start byte> + "<"

What should happen? Replace <multibyte start byte>? or Replace <multibyte start byte> + "<"?

Neither is correct and will break structured text.
 [2015-04-15 05:52 UTC]
I should have post undebatable example. 
Anyway, don't forget there is encoding includes "\" in multibyte stream.
 [2015-04-15 06:28 UTC]
Yes, from a security perspective, trying to fix a broken byte stream with any sort of guesses and replacement characters is a non-starter. You simply don't do it. Certainly not by default. If someone really wants to try to fix broken input they need to do so explicitly and hopefully they understand the risk they are taking.

The most common example char replacement problems is Internet Explorer doing substitution on broken UTF8 in IE5 and IE6. For example, something like %E0%22%3E which is byte 0xE0 followed by " and > would get substituted with a �. So what you say? Well, by swallowing the "> you have yourself a glaring XSS since all subsequent characters are now no longer inside a quoted parameter in a tag. And IE replaced those 3 bytes because the E0 byte is the start of a 3-byte UTF8 sequence, so a user could XSS IE simply by adding a %E0 to GET/POST data.
 [2015-04-15 08:44 UTC] olafvdspek at gmail dot com
Why does it not log an error and why does it not halt execution?
 [2015-04-15 14:26 UTC]
Because it is typically called on user-supplied data and having end users being able to fatal your app and/or cause a log storm is a really bad idea.
 [2015-04-15 14:33 UTC] olafvdspek at gmail dot com
If the program contains a security bug then halting the program seems one of the best possible solutions.
 [2015-04-15 20:29 UTC]
The program does not contain a security bug explicitly because htmlspecialchars() dropped the insecure data. If you want to work with insecure data, you can do so if you wish.
PHP Copyright © 2001-2022 The PHP Group
All rights reserved.
Last updated: Sat Dec 10 00:03:51 2022 UTC