php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #688 HTML special characters switched to meta-ASCII in form input
Submitted: 1998-08-24 12:28 UTC Modified: 1998-09-19 00:56 UTC
From: whit at transpect dot com Assigned: jaakko (profile)
Status: Closed Package: MySQL related
PHP Version: 3.0.1 OS: Redhat 5.0
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please !
Your email address:
MUST BE VALID
Solve the problem:
3 + 50 = ?
Subscribe to this entry?

 
 [1998-08-24 12:28 UTC] whit at transpect dot com
Using Apache 1.2.6, and data retrieved from MySQL that includes HTML
special characters, then inputting that data through a form back to PHP
to do further searches against the database, the special characters are, on this second reception by PHP, in their meta-ASCII form rather than the string for the special character, and thus no longer correlate properly with the database. (Magic Quotes is not compiled in.)

Here is a script which produces this:

===

<?php
if (isset($choice)) {
        echo "Choice: $choice<br>";
        echo "<xmp>Code: $choice</xmp><br>";
}
?>
<form action=<?php echo $PHP_SELF; ?> method=post>
<input type=radio name=choice value="Nez Perc&#233;">Nez Perc&#233; - <xmp>Nez Perc&#233;</xmp><br>
<input type=radio name=choice value="Northwest-Coast">Northwest-Coast - <xmp>Northwest-Coast</xmp>
<input type=radio name=choice value="Penobscot">Penobscot - <xmp>Penobscot</xmp><br>
<input type=submit value=Submit></form>

===

My earlier report was based on the misimpression that the error occurred on PHP's initial MySQL retrieval - which is why I'm following up under the
MySQL categore (that and there not being an Apache category). 

What I see when I run this with Nez Perce selected is that the ASCII character for e-acute shows as the returned value on submission, rather than &#233; - the same problem occurs if &eacute; is used in the data string. Since using PHP's function to turn it back into HTML special characters would produce &eacute; here, which wouldn't match &#233;, I'm looking for a solution that will keep the special characters in their original HTML form throughout.

Don't know how to test if the problem occurs in Apache's reception of the form data, or PHP's reception of it from Apache.



Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [1998-09-19 00:56 UTC] jaakko
The conversion from HTML entity &#233; happens in the client browser.
Browsers interpret the HTML code and HTML entities. They display to
the user and then send the actual data as the form data, that means
the actual ISO-8859-1 ? (e-acute), to the server. It is of course
encoded with application/x-www-form-urlencoded type of codes
during the transport, but this has nothing to do with the problem as
this encoding is again decoded in PHP. So your script gets the
actual data, not just some encoding.

There are several things you can now do to fix your system.

1) Store characters as characters in your database. Mysql by default
is binary safe, so you can store ISO-8859-1 as pure 8-bit encoding
(ie. no encoding) there. Why not do so and forget about the rest?
Actually this would help you also with the searches, as MySQL
knows by default how to do case-insensitive searches with ISO-8859-1
characters, so this way you can search with:

.... where field like '%PERC?%' ...

and find what you are looking for. I recommend this approach.

2) When printing the fields in the form, you can encode the & characters
in the fields, so that the browser does not decode them:

value="<? echo htmlspecialchars ($string) ?>"

but of course the browser also displays them as &#233; to the user then.

3) You can encode them back in your code with some function of your
own in PHP. I still recommend option 1).

4) we could write a function to PHP that encodes non-ascii characters
with &# -codes rather than ISO-8859-1 names. I do not see the immediate
need, but it could be a good idea. I still recommend option 1) for you.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Apr 16 20:01:31 2024 UTC