php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #64430 strip_tags() clobbers non-tag entities
Submitted: 2013-03-15 15:15 UTC Modified: 2020-10-07 15:21 UTC
Votes:4
Avg. Score:4.5 ± 0.9
Reproduced:4 of 4 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: gdataonline at gmail dot com Assigned: cmb (profile)
Status: Not a bug Package: Strings related
PHP Version: 5.4.13 OS: Windows 7 x64
Private report: No CVE-ID: None
 [2013-03-15 15:15 UTC] gdataonline at gmail dot com
Description:
------------
strip_tags() will clobber all remaining input after encountering non-HTML tags, regardless if embedded in "allowable-tags".

When strip_tags() encounters a "<" followed by *any non-whitespace*, it assumes it's an HTML tag, even if it couldn't legally be one.  In the example below, even something as simple as a "<=" appearing in JavaScript is enough to trigger the behavior.


For a full list of characters that strip_tags() clobbers on, see this example: http://ideone.com/BEPINI

Test script:
---------------
<?php

// Example 1: Without <=
$code = "<script>if (foo >= bar){ alert(0); }</script>";
var_dump(strip_tags($code));
var_dump(strip_tags($code, "<script>"));

// Example 2: With <=
$code = "<script>if (foo <= bar){ alert(0); }</script>";
var_dump(strip_tags($code));
var_dump(strip_tags($code, "<script>"));

?>

Expected result:
----------------
string(28) "if (foo >= bar){ alert(0); }"
string(45) "<script>if (foo >= bar){ alert(0); }</script>"

string(28) "if (foo <= bar){ alert(0); }"
string(45) "<script>if (foo <= bar){ alert(0); }</script>"

Actual result:
--------------
string(28) "if (foo >= bar){ alert(0); }"
string(45) "<script>if (foo >= bar){ alert(0); }</script>"

string(8) "if (foo "
string(16) "<script>if (foo "

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2013-03-15 19:55 UTC] gdataonline at gmail dot com
I'm starting to suspect that strip_tags treats anything that matches <[^>\s]*> as an HTML tag, regardless of where it appears or if it's valid HTML.
 [2015-03-31 10:42 UTC] marcosgdf at gmail dot com
I'm having the same problem.

Test script:
---------------
<?php

var_dump(strip_tags('This is not HTML, <email@email.com> is not a valid tag'));

Expected result:
----------------
string(54) "This is not HTML, <email@email.com> is not a valid tag"

Actual result:
--------------
string(37) "This is not HTML,  is not a valid tag"
 [2018-03-17 18:26 UTC] cmb@php.net
-Package: Unknown/Other Function +Package: Strings related
 [2020-10-07 15:21 UTC] cmb@php.net
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2020-10-07 15:21 UTC] cmb@php.net
strip_tags() does not validate the HTML, and works along the lines
of "better safe than sorry", so this is expected behavior, which
is also documented[1]:

| Because strip_tags() does not actually validate the HTML,
| partial or broken tags can result in the removal of more text/data
| than expected.

[1] <https://www.php.net/manual/en/function.strip-tags.php>
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Sep 13 08:01:28 2024 UTC