php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #68031 htmlspecialchars returns empty string, sometimes
Submitted: 2014-09-16 19:24 UTC Modified: 2016-10-12 21:53 UTC
Votes:3
Avg. Score:4.7 ± 0.5
Reproduced:3 of 3 (100.0%)
Same Version:1 (33.3%)
Same OS:3 (100.0%)
From: pfenderd at bellsouth dot net Assigned: cmb (profile)
Status: Not a bug Package: Filter related
PHP Version: 5.5.16 OS: Linux
Private report: No CVE-ID: None
 [2014-09-16 19:24 UTC] pfenderd at bellsouth dot net
Description:
------------
Occasionally,htmlspecialchars() returns an empty string value.
It is consistent for certain string values and not a random problem.
The string is usually long (several 100 chars).
The problem did not exist in version 5.3.14 but does in later versions of 5.4 and 5.5.16.
In a list of assignments to variables using htmlspecialchars(), it fails on the 5th of 10.  The others always work properly, but the other string values are always much shorter than the one that fails.

I created a simple test script, but the script always works.
The problem is probably affected by the other code surrounding the problem statement.

This is a serious problem because the code is used to maintain values in a database for a website and is not able to do so when database values cannot be seen in a management form.

NOTE:
I found a work-around solution to the problem by moving the line of code with the problem farther down the list of assignments. So it may not be a problem directly with htmlspecialchars(), but with the PHP script processor.


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2014-09-16 20:17 UTC] requinix@php.net
-Status: Open +Status: Feedback
 [2014-09-16 20:17 UTC] requinix@php.net
Thank you for this bug report. To properly diagnose the problem, we
need a short but complete example script to be able to reproduce
this bug ourselves. 

A proper reproducing script starts with <?php and ends with ?>,
is max. 10-20 lines long and does not require any external 
resources such as databases, etc. If the script requires a 
database to demonstrate the issue, please make sure it creates 
all necessary tables, stored procedures etc.

Please avoid embedding huge scripts into the report.


 [2014-09-17 05:18 UTC] rasmus@php.net
Likely because you are getting invalid utf-8. Either specify the correct charset of your input to your htmlspecialchars() call, or filter out invalid utf-8 before the call. Outputting invalid UTF-8 is a security risk so previous your PHP 5.3-based application was vulnerable and now it isn't.
 [2014-09-17 15:33 UTC] pfenderd at bellsouth dot net
-Status: Feedback +Status: Open
 [2014-09-17 15:33 UTC] pfenderd at bellsouth dot net
The text that caused the problem was pure ASCII text and had no UTF-8 characters in it.
Test case at http://dayspeak.net/test_php_bug_htmlspecialchars.php
This simple test case failed to show the problem (worked properly).
The fact that I could work around the problem by moving the problem-causing assignment down several lines of code indicates to me that the real problem is not in the htmlspecialchars() function itself but in the PHP script processor.
 [2014-09-17 17:41 UTC] requinix@php.net
-Status: Open +Status: Feedback
 [2014-09-17 17:41 UTC] requinix@php.net
Need actual source code, not just the output.

I refreshed a few times and each time it worked correctly: string was intact and unchanged (except for a couple "s that were converted to &quot;s).
 [2014-09-17 17:45 UTC] pfenderd at bellsouth dot net
-Status: Feedback +Status: Open
 [2014-09-17 17:45 UTC] pfenderd at bellsouth dot net
<?php
	$xcs_welcome_msg = <<<XXX
Thank you for taking the time to visit us here.  We would like to invite you to come and visit our church during any of our services. We are known in the community as a friendly church where the Bible is faithfully taught and preached.

Nothing quite compares to the joy of Christian friendship, and at God's First Church of Sample, we make it a priority to build lasting bonds between the members of our church family - bonds of genuine concern and commitment to one another. Best of all, this circle of care is ever widening. We would love to include you as well.

We believe that studying the Bible is vital because it not only instructs us intellectually, but it also guides us spiritually. We believe and accept it as God's Word to man, a book that is alive and relevant to life today, and learning its truths can be a thrilling adventure.

Opportunities  for fellowship and learning are offered to every person at every age level by our staff of qualified teachers and leaders. We have Sunday school classes for children, youth, and adults in which principles from the Bible are taught in an open and personal forum. In addition to our Sunday school classes, we have a discipleship course that helps the new and even the most mature Christian to develop a lifelong, personal, and obedient relationship with Jesus Christ through Biblical teachings. 

Just as Jesus Christ came, "not to be ministered unto, but to minister..." we accept our responsibility to reach out in service to others. This applies both within the church family and outside our fellowship.

Our primary reason for meeting together is to focus our attention on God, giving Him our worship, and receiving His blessing and inspiration. Each time we meet it is a special time of spiritual refreshment.
XXX;
	$cs_welcome_msg = htmlspecialchars($xcs_welcome_msg);
?>
<!doctype html>
<html lang="en">
<head>
<title>PHP Bug Test</title>
</head>
<body>
<br>
Original Variable value (<?php echo $xcs_welcome_msg;?>)<br><br>
Value from htmlspecialchars() (<?php echo $cs_welcome_msg;?>)<br><br>
</body>
</html>
 [2014-09-17 17:58 UTC] requinix@php.net
-Status: Open +Status: Feedback
 [2014-09-17 17:58 UTC] requinix@php.net
And you have problems with *that exact script*? We need code that actually fails, not something similar to it; start with the original code you're using and pare it down, removing database requirements and such, until you've reached a fairly minimal version that still breaks.
For kicks, I tried running that one a few thousand times (PHP 5.5 on Ubuntu) and every single one worked.
 [2014-09-17 18:59 UTC] pfenderd at bellsouth dot net
-Status: Feedback +Status: Open
 [2014-09-17 18:59 UTC] pfenderd at bellsouth dot net
As I mentioned in my original post, I got the code working by moving the variable assignment farther down in the code, so I don't think the problem is actually with htmlspecialchars(). I have tried to provide a section of code like the original, but there is no problem with this result.
<?php
// variable provides mock database search result of actual data that showed the problem.
$test_str = <<<XXX
Thank you for taking the time to visit us here.  We would like to invite you to come and visit our church during any of our services. We are known in the community as a friendly church where the Bible is faithfully taught and preached.

Nothing quite compares to the joy of Christian friendship, and at God's First Church of Sample, we make it a priority to build lasting bonds between the members of our church family - bonds of genuine concern and commitment to one another. Best of all, this circle of care is ever widening. We would love to include you as well.

We believe that studying the Bible is vital because it not only instructs us intellectually, but it also guides us spiritually. We believe and accept it as God's Word to man, a book that is alive and relevant to life today, and learning its truths can be a thrilling adventure.

Opportunities  for fellowship and learning are offered to every person at every age level by our staff of qualified teachers and leaders. We have Sunday school classes for children, youth, and adults in which principles from the Bible are taught in an open and personal forum. In addition to our Sunday school classes, we have a discipleship course that helps the new and even the most mature Christian to develop a lifelong, personal, and obedient relationship with Jesus Christ through Biblical teachings. 

Just as Jesus Christ came, "not to be ministered unto, but to minister..." we accept our responsibility to reach out in service to others. This applies both within the church family and outside our fellowship.

Our primary reason for meeting together is to focus our attention on God, giving Him our worship, and receiving His blessing and inspiration. Each time we meet it is a special time of spiritual refreshment.
XXX;
$cs_info = array(
	'church_subdomain_id'=>4,
	'cs_subdomain'=>"sample1",
	'cs_church'=>"God's First Church of Sample",
	'cs_pastor'=>"Sample Simon, Pastor",
	'cs_template_no'=>1,
	'cs_welcome_title'=>"Welcome to God's First Church of Sample",
	'cs_welcome_title2'=> '',
	'cs_welcome_msg'=> $test_str,
	'cs_address'=>"435 Bridge Ave",
	'cs_address2'=>"",
	'cs_city'=>"Sampleville",
	'cs_state'=>"SC",
	'cs_zip'=>"29639"
	);

	$rx = &$cs_info;
	$church_subdomain_id = $rx['church_subdomain_id'];
	$cs_subdomain = $rx['cs_subdomain'];
	$cs_church = htmlspecialchars($rx['cs_church']);
	$cs_pastor = htmlspecialchars($rx['cs_pastor']);
	$cs_template_no = $rx['cs_template_no'];
	$cs_welcome_title = htmlspecialchars($rx['cs_welcome_title']);
	$cs_welcome_title2 = htmlspecialchars($rx['cs_welcome_title2']);
	$cs_welcome_msg = htmlspecialchars($rx['cs_welcome_msg']);
	$xcs_welcome_msg = $rx['cs_welcome_msg'];
	$cs_address = htmlspecialchars($rx['cs_address']);
	$cs_address2 = htmlspecialchars($rx['cs_address2']);
	$cs_city = htmlspecialchars($rx['cs_city']);
	$cs_state = $rx['cs_state'];
	$cs_zip = $rx['cs_zip'];
// other assignments followed but no others used htmlspecialchars()
?>
<!doctype html>
<html lang="en">
<head>
<title>PHP Bug Test $2</title>
</head>
<body>
<br>
Original Variable value (<?php echo $rx['cs_welcome_msg'];?>)<br><br>
Value from htmlspecialchars() (<?php echo $cs_welcome_msg;?>)<br><br>
</body>
</html>
 [2014-09-17 19:17 UTC] rasmus@php.net
If you can't reproduce it, how do you know it wasn't a non-utf8 char in the input? What you describe is exactly what htmlspecialchars() does if it encounters and illegal character in the charset it is in. And from 5.3 to 5.4 the default charset changed from iso-8859-1 to UTF8. Those two things combined with the fact that you said it works fine in 5.3 and broke in 5.4 and 5.5 is a lot of evidence that points to the illegal utf-8 char hypothesis.

Your hypothesis that it is somehow in the general processor has 0 data points pointing to it other than a really vague one about you moving your code around a bit.
 [2014-09-17 19:35 UTC] pfenderd at bellsouth dot net
I have been using PHP for 14 years and I have encountered similar problems wth the PHP script iterpreter that gets solved by rearranging the code statements without actually change the statements themselves.
This seems to be one of those cases that will never be resolved.
Since I found a coding solution that works for me and I cannot recreate the problem in a simple test case, then I guess that we should not waste any more time on this and close the bug report.
 [2014-10-15 18:51 UTC] phpbugs at hypertwins dot org
In my case, this does seem to be a character set problem: adding ENT_SUBSTITUTE to the "flags" parameter eliminates the blank results.

This appears to be consistent with the documentation for that flag, which says it will "Replace invalid code unit sequences with a Unicode Replacement Character U+FFFD (UTF-8) or &#FFFD; (otherwise) instead of returning an empty string."

Apparently, then, returning an empty string is correct behavior in the absence of that flag.
 [2015-08-31 13:15 UTC] antropik at gmail dot com
same problem with html_entity_decode
error come with treatment of accent
don't generate an error on log (that's the biggest problem)
PHP 5.5.9-1ubuntu4.11
 [2015-08-31 13:18 UTC] antropik at gmail dot com
(same problem on htmlentities)
 [2016-10-12 14:50 UTC] cmb@php.net
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2016-10-12 14:50 UTC] cmb@php.net
> don't generate an error on log (that's the biggest problem)

This can't be fixed, however, see bug #54109 and the tickets
linked from there.

> Apparently, then, returning an empty string is correct behavior
> in the absence of that flag.

ACK. Closing.
 [2016-10-12 21:53 UTC] yohgaki@php.net
Users should validate _all_ input strings if they have valid char encoding you specified by default_charset. Use mb_check_encoding() for this purpose.

htmlspecialchars()/htmlentities() will not raise errors and return empty string for invalid char encoding by spec. 

Although you should validate char encoding at your input handling code, you may request to raise exception by additional option. It would be useful in some cases. e.g. Someone stored invalid UTF-8 string in your trusted database.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Oct 05 03:01:28 2024 UTC