php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #63212 <' breaks strip_tags()
Submitted: 2012-10-03 22:08 UTC Modified: 2021-08-30 16:29 UTC
Votes:3
Avg. Score:4.3 ± 0.9
Reproduced:2 of 2 (100.0%)
Same Version:0 (0.0%)
Same OS:1 (50.0%)
From: dac dot chartrand at gmail dot com Assigned: cmb (profile)
Status: Not a bug Package: Strings related
PHP Version: 5.4.7 OS:
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: dac dot chartrand at gmail dot com
New email:
PHP Version: OS:

 

 [2012-10-03 22:08 UTC] dac dot chartrand at gmail dot com
Description:
------------
The following character combo <' breaks strip_tags(): 

<'



Test script:
---------------
<?php

$content = "<strong>Hello World</strong><fake>Should be removed</fake><h1>Goodbye World</h1>";
$content = strip_tags($content, '<del><ins><p><div><span><hr><br><cite><strong><em><pre><img><a><h1><h2><h3><h4><h5><h6><dl><dt><dd><ul><li><ol><sub><sup><tt><blockquote><aside><table><thead><tbody><tfoot><tr><td><th>');
echo $content;

// Good
// <strong>Hello World</strong>Should be removed<h1>Goodbye World</h1>



$content = "<strong>Hello World</strong><fake>Should <' be removed</fake><h1>Goodbye World</h1>";
$content = strip_tags($content, '<del><ins><p><div><span><hr><br><cite><strong><em><pre><img><a><h1><h2><h3><h4><h5><h6><dl><dt><dd><ul><li><ol><sub><sup><tt><blockquote><aside><table><thead><tbody><tfoot><tr><td><th>');
echo $content;

// Bad
// <strong>Hello World</strong>Should

Expected result:
----------------
<strong>Hello World</strong>Should be removed<h1>Goodbye World</h1>

Actual result:
--------------
<strong>Hello World</strong>Should

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2012-10-03 23:34 UTC] pierrick@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

Hi Daniel,

I don't think this is a bug. You're opening a tag which is not terminated. So 
strip_tags will strip it.

If you replace your <' by any other char (but space) like <a you'll have the 
same behavior.
 [2012-10-03 23:34 UTC] pierrick@php.net
-Status: Open +Status: Not a bug
 [2012-10-04 01:56 UTC] dac dot chartrand at gmail dot com
Hi Pierrick

I disagree. Maybe my report needs more info. Here are two other examples:

-=-=-

$content = "<strong>Hello World</strong><fake>Should <# > be removed</fake>
<h1>Goodbye World</h1>";
$content = strip_tags($content, '<del><ins><p><div><span><hr><br><cite><strong>
<em><pre><img><a><h1><h2><h3>
<h4><h5><h6><dl><dt><dd><ul><li><ol><sub><sup><tt><blockquote><aside><table>
<thead><tbody><tfoot><tr><td>
<th>');
echo $content; 

// <strong>Hello World</strong>Should  be removed<h1>Goodbye World</h1>

$content = "<strong>Hello World</strong><fake>Should <' > be removed</fake>
<h1>Goodbye World</h1>";
$content = strip_tags($content, '<del><ins><p><div><span><hr><br><cite><strong>
<em><pre><img><a><h1><h2><h3>
<h4><h5><h6><dl><dt><dd><ul><li><ol><sub><sup><tt><blockquote><aside><table>
<thead><tbody><tfoot><tr><td>
<th>');
echo $content; 

// <strong>Hello World</strong>Should 

-=-=-

Thanks for looking into this.
 [2012-10-04 02:16 UTC] pierrick@php.net
-Status: Not a bug +Status: Open
 [2012-10-04 02:16 UTC] pierrick@php.net
Hi Daniel,

You're right, the ' is actually opening a quote which is never closed. But in a 
valid html/xml, having something like this : <'foo'> is now allowed. We could 
maybe verify that the node have a name before accepting an opening quote.
 [2012-10-04 17:44 UTC] riptide dot tempora at opinehub dot com
"Expected result:
----------------
<strong>Hello World</strong>Should be removed<h1>Goodbye World</h1>

Actual result:
--------------
<strong>Hello World</strong>Should"

Shouldn't that <strong>(.*)</strong> be eliminated to? :\
 [2012-10-10 21:36 UTC] willfitch@php.net
@riptide - no. <strong> is in the list of allowable tags.

I'll look into this one this evening.
 [2018-03-10 13:47 UTC] cmb@php.net
-Package: *General Issues +Package: Strings related
 [2021-08-30 16:29 UTC] cmb@php.net
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2021-08-30 16:29 UTC] cmb@php.net
strip_tags() is a crude tool, and it rather errs on the side of
caution, i.e. it may remove more than necessary.  That is even
documented[1]:

| Because strip_tags() does not actually validate the HTML,
| partial or broken tags can result in the removal of more text/data
| than expected.

So the reported behavior is not a bug.

Personally, I suggest to only use strip_tags() on valid HTML, or
better to not use the function at all.

[1] <https://www.php.net/strip_tags>
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Mon Nov 25 22:01:31 2024 UTC