php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #6625 htmlspecialchars should escape "'" character
Submitted: 2000-09-08 06:19 UTC Modified: 2000-09-12 07:24 UTC
From: jon+php-dev at unequivocal dot co dot uk Assigned: cmv (profile)
Status: Closed Package: Feature/Change Request
PHP Version: 4.0 Latest CVS (08/09/2000) OS: N/A
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: jon+php-dev at unequivocal dot co dot uk
New email:
PHP Version: OS:

 

 [2000-09-08 06:19 UTC] jon+php-dev at unequivocal dot co dot uk
Please first see bug report #5254.

Either this function should not escape '"', or it *should* escape "'". These characters are equivalent in HTML. For proof, see http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.2 .

If you do not escape "'", then the following will not work:

<input type='hidden' name='foo' value='<? echo htmlspecialchars($foo) ?>'>

Please do not tell me that the above HTML is not valid without reading the URL I have given first.

I do not understand the arguments put in #5254 about databases. What has this function got to do with databases?

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2000-09-08 08:41 UTC] waldschrott@php.net
we cannot simply break backwards compatibility, maybe we should add another function or an optional parameter to this one where *both* are converted

there have never been complaints about this shortcoming as opposed to where scripts broke changing this function (month ago or so)
 [2000-09-08 08:49 UTC] jon+php-dev at unequivocal dot co dot uk
I will add a note to the manual.

I am mystified as to what code could be broken by escaping additional characters, however. Could one of the people who had some code which broke give us an excerpt so we can understand the problem?

 [2000-09-11 23:53 UTC] cmv@php.net
Change the quotes around your PHP code to double quotes, and it works just fine.

As for escaping neither, I hope you agree that would just be silly.

Again, the reason this function was (erroneously) changed in the first place was because of this comment:

> Because a ' is used for db queries and I think it's pretty
> standard behaviour to escape it as well.
> For example if you use PHP together with Javascript, 
> it's much easier if it's escaped.

In the first case (DB), use addslashes() or turn magic quotes on.  In the second case (JS), you can use addslashes() also, or urlencode() or strtr().

Besides, if you want an apostrophe in your database field, you should be using "\'", not "&#039;".  Otherwise, you're going to get the 6-character string "&#039;" out of the database, not the one-character single quote.  Then what function do you use to convert it back to single-quotes?



 [2000-09-12 00:07 UTC] jon+php-dev at unequivocal dot co dot uk
You're making it very difficult to remain polite here.

I shall summarise in nice easy-to-understand bite-size pieces:

It is nothing to do with databases.

Databases are nothing to do with it.

It is to do with HTML.

Not databases.

The apostrophe is a special character in HTML.

htmlspecialchars is supposed to encode all characters that are special to HMTL.

It does not, because it does not encode apostrophes.

Backslashes are not the correct way to encode characters in HTML.

Entities are the correct way to encode characters.

As you can tell by what this function does with other special characters.

The function is broken until it encodes apostrophes.

The only reason not to make it encode apostrophes would be for reasons of backwards compatibility.

Making it encode extra characters is highly unlikely to break old code.

You said in a previous bug report that it did, in fact, break code that you had.

So it would be helpful if you could explain how, so that the problem could be understood.

 [2000-09-12 01:57 UTC] cmv@php.net
1) I understand your comments.  Treating me like a child doesn't earn you points or help your cause.


2) The reason I keep bring up databases is because that was the original reason this issue came up.  I suggest you re-read bug 5254 and see why the individual wanted single quotes to be escaped.


3) Just because you *can* write HTML with single quotes, doesn't mean we are about to change the language to be (potentially) backwardly-incompatible with the million-plus existing users of PHP4.0.1, PHP4.0.0 and all PHP3.x versions.


4) Any code that uses get_html_translation_table() to check for HTML references is broken, since it does not report single quotes.

My code (which I will not post here) read META tags from a given URL, or allowed the user to enter some META tags, then ran sanity checking on them to make sure they were valid.

Basically, anything that relied on this function working the way it did (i.e. not changing single quotes) no longer worked.


5) You say "htmlspecialchars is supposed to encode all characters that are special to HMTL."  Says who?  No, HTMLSpecialChars() is supposed to encode the double-quote, ampersand, less-than and greater-than signs.  That's what the manual says.  If anything, HTMLSpecialChars() and HTMLEntities() are supposed to convert those characters into their equivalent character entity.  The single quote does not have a character entity, only a numeric one.  Refer to http://www.w3.org/TR/html4/sgml/entities.html


6) I can't think of very many cases (except for the one I mentioned) where this change would break PHP code.  However, just because you or I can't think of examples, doesn't mean that there aren't any.  Commiting a change to the language that is definitly backwardly-incompatible (i.e. the function behaves differently than it used to) is not a good thing to do, no matter how "safe" you think it is.

Unless you can convince me and/or the core developers, and are positive that this won't break existing code, it's not going to happen.

It shouldn't have happened in the first place.  Especially given the possibility of breakage ... and because (as you have shown in your notes in the manual) it is trivial to implement in a user-defined function.

 [2000-09-12 03:26 UTC] jon+php-dev at unequivocal dot co dot uk
1) If you had understood, then your reply would have sense. As it was, it did not.

Also, in what way is this "my cause"? It is a bug I brought to the attention of the PHP developers in the way that they helpfully provided to do so. It is their cause, not mine. They [should have] more interest in seeing PHP work correctly than I do.

2) I don't care why the individual in bug #5254 wanted anything. It has nothing to do with the issue itself. The way they arrived at their conclusion that the function does not do the right thing is irrelevant, their conclusion itself was correct.

...

5) Says the manual. Since you are obviously having trouble finding it, I shall quote it for you here:

"Certain characters have special significance in HTML, and should be represented by HTML entities if they are to preserve their meanings. This function returns a string with these conversions made."

This is exactly what I said. The manual also says "At *present*, the translations done are", (my emphasis) with the clear implication that this might change. There is nothing mentioned anywhere about there being some mysterious distinction between numeric and named entities.

If you have written code that relies on other characters *not* being encoded, then your code is broken. Don't believe me - believe the manual.


You clearly have no idea what backwards compatability actually is. PHP changes things that could break old programs every day. Recently, EscapeShellCmd found a new character to escape - potentially changing the behaviour of shell commands in old scripts. New functions were added in 4.0.3RC1 - if I had a script that had a function called 'is_uploaded_file', then it will not work on the next release of PHP.

These are much greater incompatibilities than the one proposed here, yet they pass by without comment, whereas this function which is clearly documented as possibly changing in a later release (which makes the change *not backwards incompatable) is set in stone forever, despite being wrong!

Please do not respond any more on this bug ID, your input is not useful.
 [2000-09-12 05:44 UTC] cmv@php.net
Despite your final request that I not comment on this, I will.

No one -- neither you, nor I, nor any of the other PHP developers or contributors to the language -- appreciate being treated like infantile morons.  If you can not make your point without resorting to this kind of behaviour and language, then ... well, I don't know what to suggest.  Just don't expect help from any of us.

"Your cause" was trying to convince me and the other developers that HTMLSpecialChars() should encode single quotes.  It is "your cause" because you are arguing for a change to the language.  Yes, you are arguing that it is an improvement to the language.  I simply disagree.

The original bug report is *entirely* relevant since that is what started the discussion in the first place.  Maybe it's not relevant to you, but that's what brought you here, no?

To recap, someone wanted single quotes escaped to help them with DB and JS stuff.  It was added to PHP 4.0.2.  It should *not* have been added because the functionality they wanted already existed in other functions.  I took it out of CVS because of this.  It came to *my* attention because code I had written that relied on the documented and historical operation of HTMLSpecialChars() no longer worked.  I was not convinced at the time that this was a valid change to this function, and I only wish I had caught it before it made it into PHP 4.0.2.

I would suggest, that had it not, none of this discussion would ever have taken place.  You would have continued using whatever solution you made to solve your unique problem, and I would continue coding my sites to my liking.

As you kindly pointed out to me, the manual current says "At present, the translations done are".  In my mind, there is no clear implication in that statement that this might change.  What you infer from it, well ... that's what you infer.

[For what it's worth, the wording in the manual is now being changed to reflect the operation of the function more accurately.  See http://snaps.php.net/manual/ if you are interested. ]

You then say: "If you have written code that relies on other characters *not* being encoded, then your
code is broken. Don't believe me - believe the manual."  This is absolute garbage.  Why should I need to change my code, which has functioned perfectly since PHP2 and relies on a *well* documented feature, because Mr. Bug 5254 doesn't know about AddSlashes() and because you prefer to use single quotes in your HTML?  I'm sorry, but no thanks.

What did you do before PHP 4.0.2, I wonder.

I know what backwards-compatability means.  It means that, within reason, all efforts should be made to keep the language functioning as it has.  "Within reason" is the operative phrase here.  Escaping another character in shell scripts is a good change -- it might prevent someone from running malicious code.  Adding is_uploaded_file() is good --- it too is part of a fix to close a potential security hole.

Yes, new functions are added, and if you already wrote a function with that name, you are out of luck.  It's happened to me, and I'm sure it's happened to you.  Those kind of "backwardly-incompatible" changes are unfortunate, but serve to improve the language.

Changing the function of HTMLSpecialChars() is not, in my opinion, an improvement to the language.  It a) breaks existing code, and b) adds little functionality to the language that couldn't be added in a user's code.

Again, the operative word above is "in my opinion".  Since I was affected, and I am a developer, I changed the source code.

Feel free to do the same yourself.

Or try your luck with the other developers.  As far as I am concerned, this issue is closed.
 [2000-09-12 06:00 UTC] cmv@php.net
Despite your final request that I not comment on this, I will.

No one -- neither you, nor I, nor any of the other PHP developers or contributors to the language -- appreciate being treated like infantile morons.  If you can not make your point without resorting to this kind of behaviour and language, then ... well, I don't know what to suggest.  Just don't expect help from any of us.

"Your cause" was trying to convince me and the other developers that HTMLSpecialChars() should encode single quotes.  It is "your cause" because you are arguing for a change to the language.  Yes, you are arguing that it is an improvement to the language.  I simply disagree.

The original bug report is *entirely* relevant since that is what started the discussion in the first place.  Maybe it's not relevant to you, but that's what brought you here, no?

To recap, someone wanted single quotes escaped to help them with DB and JS stuff.  It was added to PHP 4.0.2.  It should *not* have been added because the functionality they wanted already existed in other functions.  I took it out of CVS because of this.  It came to *my* attention because code I had written that relied on the documented and historical operation of HTMLSpecialChars() no longer worked.  I was not convinced at the time that this was a valid change to this function, and I only wish I had caught it before it made it into PHP 4.0.2.

I would suggest, that had it not, none of this discussion would ever have taken place.  You would have continued using whatever solution you made to solve your unique problem, and I would continue coding my sites to my liking.

As you kindly pointed out to me, the manual current says "At present, the translations done are".  In my mind, there is no clear implication in that statement that this might change.  What you infer from it, well ... that's what you infer.

[For what it's worth, the wording in the manual is now being changed to reflect the operation of the function more accurately.  See http://snaps.php.net/manual/ if you are interested. ]

You then say: "If you have written code that relies on other characters *not* being encoded, then your
code is broken. Don't believe me - believe the manual."  This is absolute garbage.  Why should I need to change my code, which has functioned perfectly since PHP2 and relies on a *well* documented feature, because Mr. Bug 5254 doesn't know about AddSlashes() and because you prefer to use single quotes in your HTML?  I'm sorry, but no thanks.

What did you do before PHP 4.0.2, I wonder.

I know what backwards-compatability means.  It means that, within reason, all efforts should be made to keep the language functioning as it has.  "Within reason" is the operative phrase here.  Escaping another character in shell scripts is a good change -- it might prevent someone from running malicious code.  Adding is_uploaded_file() is good --- it too is part of a fix to close a potential security hole.

Yes, new functions are added, and if you already wrote a function with that name, you are out of luck.  It's happened to me, and I'm sure it's happened to you.  Those kind of "backwardly-incompatible" changes are unfortunate, but serve to improve the language.

Changing the function of HTMLSpecialChars() is not, in my opinion, an improvement to the language.  It a) breaks existing code, and b) adds little functionality to the language that couldn't be added in a user's code.

Again, the operative word above is "in my opinion".  Since I was affected, and I am a developer, I changed the source code.

Feel free to do the same yourself.

Or try your luck with the other developers.  As far as I am concerned, this issue is closed.
 [2000-09-12 07:24 UTC] jon+php-dev at unequivocal dot co dot uk
I have replied by private email. I'm sure the rest of you are bored with this inanity.
 
PHP Copyright © 2001-2026 The PHP Group
All rights reserved.
Last updated: Mon Apr 20 17:00:02 2026 UTC