php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #63212 <' breaks strip_tags()
Submitted: 2012-10-03 22:08 UTC Modified: 2021-08-30 16:29 UTC
Votes:3
Avg. Score:4.3 ± 0.9
Reproduced:2 of 2 (100.0%)
Same Version:0 (0.0%)
Same OS:1 (50.0%)
From: dac dot chartrand at gmail dot com Assigned: cmb (profile)
Status: Not a bug Package: Strings related
PHP Version: 5.4.7 OS:
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: dac dot chartrand at gmail dot com
New email:
PHP Version: OS:

 

 [2012-10-03 22:08 UTC] dac dot chartrand at gmail dot com
Description:
------------
The following character combo <' breaks strip_tags(): 

<'



Test script:
---------------
<?php

$content = "<strong>Hello World</strong><fake>Should be removed</fake><h1>Goodbye World</h1>";
$content = strip_tags($content, '<del><ins><p><div><span><hr><br><cite><strong><em><pre><img><a><h1><h2><h3><h4><h5><h6><dl><dt><dd><ul><li><ol><sub><sup><tt><blockquote><aside><table><thead><tbody><tfoot><tr><td><th>');
echo $content;

// Good
// <strong>Hello World</strong>Should be removed<h1>Goodbye World</h1>



$content = "<strong>Hello World</strong><fake>Should <' be removed</fake><h1>Goodbye World</h1>";
$content = strip_tags($content, '<del><ins><p><div><span><hr><br><cite><strong><em><pre><img><a><h1><h2><h3><h4><h5><h6><dl><dt><dd><ul><li><ol><sub><sup><tt><blockquote><aside><table><thead><tbody><tfoot><tr><td><th>');
echo $content;

// Bad
// <strong>Hello World</strong>Should

Expected result:
----------------
<strong>Hello World</strong>Should be removed<h1>Goodbye World</h1>

Actual result:
--------------
<strong>Hello World</strong>Should

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2012-10-03 23:34 UTC] pierrick@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

Hi Daniel,

I don't think this is a bug. You're opening a tag which is not terminated. So 
strip_tags will strip it.

If you replace your <' by any other char (but space) like <a you'll have the 
same behavior.
 [2012-10-03 23:34 UTC] pierrick@php.net
-Status: Open +Status: Not a bug
 [2012-10-04 01:56 UTC] dac dot chartrand at gmail dot com
Hi Pierrick

I disagree. Maybe my report needs more info. Here are two other examples:

-=-=-

$content = "<strong>Hello World</strong><fake>Should <# > be removed</fake>
<h1>Goodbye World</h1>";
$content = strip_tags($content, '<del><ins><p><div><span><hr><br><cite><strong>
<em><pre><img><a><h1><h2><h3>
<h4><h5><h6><dl><dt><dd><ul><li><ol><sub><sup><tt><blockquote><aside><table>
<thead><tbody><tfoot><tr><td>
<th>');
echo $content; 

// <strong>Hello World</strong>Should  be removed<h1>Goodbye World</h1>

$content = "<strong>Hello World</strong><fake>Should <' > be removed</fake>
<h1>Goodbye World</h1>";
$content = strip_tags($content, '<del><ins><p><div><span><hr><br><cite><strong>
<em><pre><img><a><h1><h2><h3>
<h4><h5><h6><dl><dt><dd><ul><li><ol><sub><sup><tt><blockquote><aside><table>
<thead><tbody><tfoot><tr><td>
<th>');
echo $content; 

// <strong>Hello World</strong>Should 

-=-=-

Thanks for looking into this.
 [2012-10-04 02:16 UTC] pierrick@php.net
-Status: Not a bug +Status: Open
 [2012-10-04 02:16 UTC] pierrick@php.net
Hi Daniel,

You're right, the ' is actually opening a quote which is never closed. But in a 
valid html/xml, having something like this : <'foo'> is now allowed. We could 
maybe verify that the node have a name before accepting an opening quote.
 [2012-10-04 17:44 UTC] riptide dot tempora at opinehub dot com
"Expected result:
----------------
<strong>Hello World</strong>Should be removed<h1>Goodbye World</h1>

Actual result:
--------------
<strong>Hello World</strong>Should"

Shouldn't that <strong>(.*)</strong> be eliminated to? :\
 [2012-10-10 21:36 UTC] willfitch@php.net
@riptide - no. <strong> is in the list of allowable tags.

I'll look into this one this evening.
 [2018-03-10 13:47 UTC] cmb@php.net
-Package: *General Issues +Package: Strings related
 [2021-08-30 16:29 UTC] cmb@php.net
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2021-08-30 16:29 UTC] cmb@php.net
strip_tags() is a crude tool, and it rather errs on the side of
caution, i.e. it may remove more than necessary.  That is even
documented[1]:

| Because strip_tags() does not actually validate the HTML,
| partial or broken tags can result in the removal of more text/data
| than expected.

So the reported behavior is not a bug.

Personally, I suggest to only use strip_tags() on valid HTML, or
better to not use the function at all.

[1] <https://www.php.net/strip_tags>
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Mar 19 10:01:30 2024 UTC