|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #64430 strip_tags() clobbers non-tag entities
Submitted: 2013-03-15 15:15 UTC Modified: 2020-10-07 15:21 UTC
Avg. Score:4.5 ± 0.9
Reproduced:4 of 4 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: gdataonline at gmail dot com Assigned: cmb (profile)
Status: Not a bug Package: Strings related
PHP Version: 5.4.13 OS: Windows 7 x64
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please !
Your email address:
Solve the problem:
10 + 38 = ?
Subscribe to this entry?

 [2013-03-15 15:15 UTC] gdataonline at gmail dot com
strip_tags() will clobber all remaining input after encountering non-HTML tags, regardless if embedded in "allowable-tags".

When strip_tags() encounters a "<" followed by *any non-whitespace*, it assumes it's an HTML tag, even if it couldn't legally be one.  In the example below, even something as simple as a "<=" appearing in JavaScript is enough to trigger the behavior.

For a full list of characters that strip_tags() clobbers on, see this example:

Test script:

// Example 1: Without <=
$code = "<script>if (foo >= bar){ alert(0); }</script>";
var_dump(strip_tags($code, "<script>"));

// Example 2: With <=
$code = "<script>if (foo <= bar){ alert(0); }</script>";
var_dump(strip_tags($code, "<script>"));


Expected result:
string(28) "if (foo >= bar){ alert(0); }"
string(45) "<script>if (foo >= bar){ alert(0); }</script>"

string(28) "if (foo <= bar){ alert(0); }"
string(45) "<script>if (foo <= bar){ alert(0); }</script>"

Actual result:
string(28) "if (foo >= bar){ alert(0); }"
string(45) "<script>if (foo >= bar){ alert(0); }</script>"

string(8) "if (foo "
string(16) "<script>if (foo "


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2013-03-15 19:55 UTC] gdataonline at gmail dot com
I'm starting to suspect that strip_tags treats anything that matches <[^>\s]*> as an HTML tag, regardless of where it appears or if it's valid HTML.
 [2015-03-31 10:42 UTC] marcosgdf at gmail dot com
I'm having the same problem.

Test script:

var_dump(strip_tags('This is not HTML, <> is not a valid tag'));

Expected result:
string(54) "This is not HTML, <> is not a valid tag"

Actual result:
string(37) "This is not HTML,  is not a valid tag"
 [2018-03-17 18:26 UTC]
-Package: Unknown/Other Function +Package: Strings related
 [2020-10-07 15:21 UTC]
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2020-10-07 15:21 UTC]
strip_tags() does not validate the HTML, and works along the lines
of "better safe than sorry", so this is expected behavior, which
is also documented[1]:

| Because strip_tags() does not actually validate the HTML,
| partial or broken tags can result in the removal of more text/data
| than expected.

[1] <>
PHP Copyright © 2001-2023 The PHP Group
All rights reserved.
Last updated: Tue Mar 28 12:03:45 2023 UTC