php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #34863 strip_tags() does not work when string includes <img tags
Submitted: 2005-10-14 03:24 UTC Modified: 2005-10-19 21:02 UTC
Votes:8
Avg. Score:4.5 ± 0.7
Reproduced:6 of 7 (85.7%)
Same Version:3 (50.0%)
Same OS:3 (50.0%)
From: backdream at gmail dot com Assigned:
Status: Wont fix Package: Strings related
PHP Version: 4.3.11 OS: Windows
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: backdream at gmail dot com
New email:
PHP Version: OS:

 

 [2005-10-14 03:24 UTC] backdream at gmail dot com
Description:
------------
strip_tags cannot work when string include <img tags

Reproduce code:
---------------
<?php

$txt = "Next: <a href=\"http://www.joelonsoftware.com/uibook/chapters/fog0000000063.html\" target=\"_blank\">Designing for People Who Have Better Things To Do With Their Lives, Part Two</a> <br />????<br />????<img src=\"http://86.0.190.20/test/ipb21/skin_acp/IPB2_Standard/images/users.png\" border=\"0\" align=\"absmiddle\" alt=\"????ͼƬ\" onload=\"if(screen.width*0.7<this.width) {this.resized=true; this.width=screen.width*0.7;}\" /><br />????<b>Adverment&#33;</b> Do you need to control a computer remotely, even when firewalls get in the way? My company&#39;s latest product, <a href=\"https://www.copilot.com/\" target=\"_blank\">Fog Creek Copilot</a>, is a remote control system that requires no setup, no configuration, and works even if both users are behind firewalls. It&#39;s designed to make remote tech support easy. <br />????<br />????Enter your email address to receive a (very occasional) email whenever I write a major new article. You can unsubscribe at any time, of course.<br />????</div>";
$txt = preg_replace( "#<img[^>]*>#i", "", $txt );
$txt = strip_tags( $txt );
print $txt;

?>

Expected result:
----------------
strip all the html tags

Actual result:
--------------
break when run over <img tag.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2005-10-14 03:31 UTC] backdream at gmail dot com
Sorry, the Reproduce code would be:
---------------
<?php

$txt = "Next: <a
href=\"http://www.joelonsoftware.com/uibook/chapters/fog0000000063.html\
" target=\"_blank\">Designing for People Who Have Better Things To Do
With Their Lives, Part Two</a> <br />????<br />????<img
src=\"http://86.0.190.20/test/ipb21/skin_acp/IPB2_Standard/images/users.
png\" border=\"0\" align=\"absmiddle\" alt=\"????ͼƬ\"
onload=\"if(screen.width*0.7<this.width) {this.resized=true;
this.width=screen.width*0.7;}\" /><br />????<b>Adverment&#33;</b> Do you
need to control a computer remotely, even when firewalls get in the way?
My company&#39;s latest product, <a href=\"https://www.copilot.com/\"
target=\"_blank\">Fog Creek Copilot</a>, is a remote control system that
requires no setup, no configuration, and works even if both users are
behind firewalls. It&#39;s designed to make remote tech support easy.
<br />????<br />????Enter your email address to receive a (very
occasional) email whenever I write a major new article. You can
unsubscribe at any time, of course.<br />????</div>";

$txt = strip_tags( $txt );
print $txt;

?>
 [2005-10-14 13:18 UTC] tony2001@php.net
Can't reproduce.
Both your scripts work fine with any PHP version I can find (4.3.11, 4.4, 5.0.x, 5.1).

 [2005-10-14 19:36 UTC] backdream at gmail dot com
I tested with 4.3.11, It also have the bug.

------
<?php

$txt = "Next: <ahref=\"http://www.joelonsoftware.com/uibook/chapters/fog0000000063.html\" target=\"_blank\">Designing for People Who Have Better Things To DoWith Their Lives, Part Two</a> <br />????<br />????<imgsrc=\"http://86.0.190.20/test/ipb21/skin_acp/IPB2_Standard/images/users.png\" border=\"0\" align=\"absmiddle\" alt=\"????ͼƬ\"onload=\"if(screen.width*0.7<this.width) {this.resized=true;this.width=screen.width*0.7;}\" /><br />????<b>Adverment&#33;</b> Doyouneed to control a computer remotely, even when firewalls get in theway?My company&#39;s latest product, <a href=\"https://www.copilot.com/\"target=\"_blank\">Fog Creek Copilot</a>, is a remote control systemthatrequires no setup, no configuration, and works even if both users arebehind firewalls. It&#39;s designed to make remote tech support easy.<br />????<br />????Enter your email address to receive a (veryoccasional) email whenever I write a major new article. You canunsubscribe at any time, of course.<br />????</div>";

$txt = strip_tags( $txt );
print $txt;

?>
-----------

It print "Next: Designing for People Who Have Better Things To DoWith Their Lives, Part Two ????????", but not the whole html tags striped string.
 [2005-10-19 20:57 UTC] lm at bible dot ch
OS:  Windows XP
PHP: 5.0.3

Further investigation:
----------------------
strip_tags does not handle tags correctly if '<' is occurring in an attribute value as it does in the example in the onload="if(screen.width*0.7<this(...)" attribute

strip_tags does provide for nested tags as shows the following example, but doesn't ignore quoted < and > inside tags as the following two examples show:

<?php
//example 1
$txt = 'text1<tag attr="<"> text2<tag attr=">"> text3';
$txt = strip_tags( $txt );
print $txt;
// prints 'text1 text3'
// should print 'text1 text2 text3' 

//example 2
$txt = 'text1<tag <nested tag>> text2';
$txt = strip_tags( $txt );
print $txt;
// prints 'text1 text2' as it should if strip_tags searches for nested tags
?>

Two questions arise:
1. Is the detection of nested tags reasonable?
2. Is a HTML-tag malformed if it contains a quoted '<' or '>' sign?
3. If such a HTML-tag is malformed, shouldn't strip_tag() still work correctly, since there is no ambiguity thanks to the quotes?

Answers:
1. I cannot figure any real world example of nested tags other than HTML-comments like <!--<br />-->
2. According to w3c.org html-validator, it is valid.
3. Strip_tags should still work

No parsing of quotes:
---------------------
The following example proves, that strip_tags does not parse quotes inside '<' and '>' at all
<?php
//example 3
$txt = 'text1<tag attr="> text2 <tag attr="> text3';
$txt = strip_tags( $txt );
print $txt;
// prints: 'text1 text2 text3'
// should print: 'text1 text3' since there is one only tag with an attribute 'attr="> text2 <tag attr="'
?>

Conclusion:
-----------
This IS a real bug, but not related to <img> tag, but to not parsing of attributes and quotes inside tags.

It would be important to correct that bug

With best regards Lorenz Meyer
 [2005-10-19 21:02 UTC] tony2001@php.net
Warning	
Because strip_tags() does not actually validate the HTML, partial, or broken tags can result in the removal of more text/data than expected.
(c) http://php.net/strip_tags
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Apr 18 17:01:28 2024 UTC