PHP :: Bug #64430 :: strip_tags() clobbers non-tag entities

Bug #64430

strip_tags() clobbers non-tag entities

Submitted:

2013-03-15 15:15 UTC

Modified:

2020-10-07 15:21 UTC

Votes:	4
Avg. Score:	4.5 ± 0.9
Reproduced:	4 of 4 (100.0%)
Same Version:	0 (0.0%)
Same OS:	0 (0.0%)

From:

gdataonline at gmail dot com

Assigned:

cmb (profile)

Status:

Not a bug

Package:

Strings related

PHP Version:

5.4.13

OS:

Windows 7 x64

Private report:

CVE-ID:

None

View Developer Edit

Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please !

Your email address: MUST BE VALID
Solve the problem: 31 + 15 = ?
Subscribe to this entry?

[2013-03-15 15:15 UTC] gdataonline at gmail dot com

Description:
------------
strip_tags() will clobber all remaining input after encountering non-HTML tags, regardless if embedded in "allowable-tags".

When strip_tags() encounters a "<" followed by *any non-whitespace*, it assumes it's an HTML tag, even if it couldn't legally be one.  In the example below, even something as simple as a "<=" appearing in JavaScript is enough to trigger the behavior.


For a full list of characters that strip_tags() clobbers on, see this example: http://ideone.com/BEPINI

Test script:
---------------
<?php

// Example 1: Without <=
$code = "<script>if (foo >= bar){ alert(0); }</script>";
var_dump(strip_tags($code));
var_dump(strip_tags($code, "<script>"));

// Example 2: With <=
$code = "<script>if (foo <= bar){ alert(0); }</script>";
var_dump(strip_tags($code));
var_dump(strip_tags($code, "<script>"));

?>

Expected result:
----------------
string(28) "if (foo >= bar){ alert(0); }"
string(45) "<script>if (foo >= bar){ alert(0); }</script>"

string(28) "if (foo <= bar){ alert(0); }"
string(45) "<script>if (foo <= bar){ alert(0); }</script>"

Actual result:
--------------
string(28) "if (foo >= bar){ alert(0); }"
string(45) "<script>if (foo >= bar){ alert(0); }</script>"

string(8) "if (foo "
string(16) "<script>if (foo "

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports

[2013-03-15 19:55 UTC] gdataonline at gmail dot com

I'm starting to suspect that strip_tags treats anything that matches <[^>\s]*> as an HTML tag, regardless of where it appears or if it's valid HTML.

[2015-03-31 10:42 UTC] marcosgdf at gmail dot com

I'm having the same problem.

Test script:
---------------
<?php

var_dump(strip_tags('This is not HTML, <email@email.com> is not a valid tag'));

Expected result:
----------------
string(54) "This is not HTML, <email@email.com> is not a valid tag"

Actual result:
--------------
string(37) "This is not HTML,  is not a valid tag"

[2018-03-17 18:26 UTC] cmb@php.net

-Package: Unknown/Other Function +Package: Strings related

[2020-10-07 15:21 UTC] cmb@php.net

-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb

[2020-10-07 15:21 UTC] cmb@php.net

strip_tags() does not validate the HTML, and works along the lines
of "better safe than sorry", so this is expected behavior, which
is also documented[1]:

| Because strip_tags() does not actually validate the HTML,
| partial or broken tags can result in the removal of more text/data
| than expected.

[1] <https://www.php.net/manual/en/function.strip-tags.php>

	php.net \| support \| documentation \| report a bug \| advanced search \| search howto \| statistics \| random bug \| login
go to bug id or search bugs for


Copyright © 2001-2025 The PHP Group All rights reserved.	Last updated: Thu Jul 03 03:01:33 2025 UTC