php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #64430 strip_tags() clobbers non-tag entities
Submitted: 2013-03-15 15:15 UTC Modified: 2020-10-07 15:21 UTC
Votes:4
Avg. Score:4.5 ± 0.9
Reproduced:4 of 4 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: gdataonline at gmail dot com Assigned: cmb (profile)
Status: Not a bug Package: Strings related
PHP Version: 5.4.13 OS: Windows 7 x64
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: gdataonline at gmail dot com
New email:
PHP Version: OS:

 

 [2013-03-15 15:15 UTC] gdataonline at gmail dot com
Description:
------------
strip_tags() will clobber all remaining input after encountering non-HTML tags, regardless if embedded in "allowable-tags".

When strip_tags() encounters a "<" followed by *any non-whitespace*, it assumes it's an HTML tag, even if it couldn't legally be one.  In the example below, even something as simple as a "<=" appearing in JavaScript is enough to trigger the behavior.


For a full list of characters that strip_tags() clobbers on, see this example: http://ideone.com/BEPINI

Test script:
---------------
<?php

// Example 1: Without <=
$code = "<script>if (foo >= bar){ alert(0); }</script>";
var_dump(strip_tags($code));
var_dump(strip_tags($code, "<script>"));

// Example 2: With <=
$code = "<script>if (foo <= bar){ alert(0); }</script>";
var_dump(strip_tags($code));
var_dump(strip_tags($code, "<script>"));

?>

Expected result:
----------------
string(28) "if (foo >= bar){ alert(0); }"
string(45) "<script>if (foo >= bar){ alert(0); }</script>"

string(28) "if (foo <= bar){ alert(0); }"
string(45) "<script>if (foo <= bar){ alert(0); }</script>"

Actual result:
--------------
string(28) "if (foo >= bar){ alert(0); }"
string(45) "<script>if (foo >= bar){ alert(0); }</script>"

string(8) "if (foo "
string(16) "<script>if (foo "

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2013-03-15 19:55 UTC] gdataonline at gmail dot com
I'm starting to suspect that strip_tags treats anything that matches <[^>\s]*> as an HTML tag, regardless of where it appears or if it's valid HTML.
 [2015-03-31 10:42 UTC] marcosgdf at gmail dot com
I'm having the same problem.

Test script:
---------------
<?php

var_dump(strip_tags('This is not HTML, <email@email.com> is not a valid tag'));

Expected result:
----------------
string(54) "This is not HTML, <email@email.com> is not a valid tag"

Actual result:
--------------
string(37) "This is not HTML,  is not a valid tag"
 [2018-03-17 18:26 UTC] cmb@php.net
-Package: Unknown/Other Function +Package: Strings related
 [2020-10-07 15:21 UTC] cmb@php.net
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2020-10-07 15:21 UTC] cmb@php.net
strip_tags() does not validate the HTML, and works along the lines
of "better safe than sorry", so this is expected behavior, which
is also documented[1]:

| Because strip_tags() does not actually validate the HTML,
| partial or broken tags can result in the removal of more text/data
| than expected.

[1] <https://www.php.net/manual/en/function.strip-tags.php>
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Mon Nov 25 22:01:31 2024 UTC