php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #34863 strip_tags() does not work when string includes <img tags
Submitted: 2005-10-14 03:24 UTC Modified: 2005-10-19 21:02 UTC
Votes:8
Avg. Score:4.5 ± 0.7
Reproduced:6 of 7 (85.7%)
Same Version:3 (50.0%)
Same OS:3 (50.0%)
From: backdream at gmail dot com Assigned:
Status: Wont fix Package: Strings related
PHP Version: 4.3.11 OS: Windows
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: backdream at gmail dot com
New email:
PHP Version: OS:

 

 [2005-10-14 03:24 UTC] backdream at gmail dot com
Description:
------------
strip_tags cannot work when string include <img tags

Reproduce code:
---------------
<?php

$txt = "Next: <a href=\"http://www.joelonsoftware.com/uibook/chapters/fog0000000063.html\" target=\"_blank\">Designing for People Who Have Better Things To Do With Their Lives, Part Two</a> <br />????<br />????<img src=\"http://86.0.190.20/test/ipb21/skin_acp/IPB2_Standard/images/users.png\" border=\"0\" align=\"absmiddle\" alt=\"????ͼƬ\" onload=\"if(screen.width*0.7<this.width) {this.resized=true; this.width=screen.width*0.7;}\" /><br />????<b>Adverment&#33;</b> Do you need to control a computer remotely, even when firewalls get in the way? My company&#39;s latest product, <a href=\"https://www.copilot.com/\" target=\"_blank\">Fog Creek Copilot</a>, is a remote control system that requires no setup, no configuration, and works even if both users are behind firewalls. It&#39;s designed to make remote tech support easy. <br />????<br />????Enter your email address to receive a (very occasional) email whenever I write a major new article. You can unsubscribe at any time, of course.<br />????</div>";
$txt = preg_replace( "#<img[^>]*>#i", "", $txt );
$txt = strip_tags( $txt );
print $txt;

?>

Expected result:
----------------
strip all the html tags

Actual result:
--------------
break when run over <img tag.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2005-10-14 03:31 UTC] backdream at gmail dot com
Sorry, the Reproduce code would be:
---------------
<?php

$txt = "Next: <a
href=\"http://www.joelonsoftware.com/uibook/chapters/fog0000000063.html\
" target=\"_blank\">Designing for People Who Have Better Things To Do
With Their Lives, Part Two</a> <br />????<br />????<img
src=\"http://86.0.190.20/test/ipb21/skin_acp/IPB2_Standard/images/users.
png\" border=\"0\" align=\"absmiddle\" alt=\"????ͼƬ\"
onload=\"if(screen.width*0.7<this.width) {this.resized=true;
this.width=screen.width*0.7;}\" /><br />????<b>Adverment&#33;</b> Do you
need to control a computer remotely, even when firewalls get in the way?
My company&#39;s latest product, <a href=\"https://www.copilot.com/\"
target=\"_blank\">Fog Creek Copilot</a>, is a remote control system that
requires no setup, no configuration, and works even if both users are
behind firewalls. It&#39;s designed to make remote tech support easy.
<br />????<br />????Enter your email address to receive a (very
occasional) email whenever I write a major new article. You can
unsubscribe at any time, of course.<br />????</div>";

$txt = strip_tags( $txt );
print $txt;

?>
 [2005-10-14 13:18 UTC] tony2001@php.net
Can't reproduce.
Both your scripts work fine with any PHP version I can find (4.3.11, 4.4, 5.0.x, 5.1).

 [2005-10-14 19:36 UTC] backdream at gmail dot com
I tested with 4.3.11, It also have the bug.

------
<?php

$txt = "Next: <ahref=\"http://www.joelonsoftware.com/uibook/chapters/fog0000000063.html\" target=\"_blank\">Designing for People Who Have Better Things To DoWith Their Lives, Part Two</a> <br />????<br />????<imgsrc=\"http://86.0.190.20/test/ipb21/skin_acp/IPB2_Standard/images/users.png\" border=\"0\" align=\"absmiddle\" alt=\"????ͼƬ\"onload=\"if(screen.width*0.7<this.width) {this.resized=true;this.width=screen.width*0.7;}\" /><br />????<b>Adverment&#33;</b> Doyouneed to control a computer remotely, even when firewalls get in theway?My company&#39;s latest product, <a href=\"https://www.copilot.com/\"target=\"_blank\">Fog Creek Copilot</a>, is a remote control systemthatrequires no setup, no configuration, and works even if both users arebehind firewalls. It&#39;s designed to make remote tech support easy.<br />????<br />????Enter your email address to receive a (veryoccasional) email whenever I write a major new article. You canunsubscribe at any time, of course.<br />????</div>";

$txt = strip_tags( $txt );
print $txt;

?>
-----------

It print "Next: Designing for People Who Have Better Things To DoWith Their Lives, Part Two ????????", but not the whole html tags striped string.
 [2005-10-19 20:57 UTC] lm at bible dot ch
OS:  Windows XP
PHP: 5.0.3

Further investigation:
----------------------
strip_tags does not handle tags correctly if '<' is occurring in an attribute value as it does in the example in the onload="if(screen.width*0.7<this(...)" attribute

strip_tags does provide for nested tags as shows the following example, but doesn't ignore quoted < and > inside tags as the following two examples show:

<?php
//example 1
$txt = 'text1<tag attr="<"> text2<tag attr=">"> text3';
$txt = strip_tags( $txt );
print $txt;
// prints 'text1 text3'
// should print 'text1 text2 text3' 

//example 2
$txt = 'text1<tag <nested tag>> text2';
$txt = strip_tags( $txt );
print $txt;
// prints 'text1 text2' as it should if strip_tags searches for nested tags
?>

Two questions arise:
1. Is the detection of nested tags reasonable?
2. Is a HTML-tag malformed if it contains a quoted '<' or '>' sign?
3. If such a HTML-tag is malformed, shouldn't strip_tag() still work correctly, since there is no ambiguity thanks to the quotes?

Answers:
1. I cannot figure any real world example of nested tags other than HTML-comments like <!--<br />-->
2. According to w3c.org html-validator, it is valid.
3. Strip_tags should still work

No parsing of quotes:
---------------------
The following example proves, that strip_tags does not parse quotes inside '<' and '>' at all
<?php
//example 3
$txt = 'text1<tag attr="> text2 <tag attr="> text3';
$txt = strip_tags( $txt );
print $txt;
// prints: 'text1 text2 text3'
// should print: 'text1 text3' since there is one only tag with an attribute 'attr="> text2 <tag attr="'
?>

Conclusion:
-----------
This IS a real bug, but not related to <img> tag, but to not parsing of attributes and quotes inside tags.

It would be important to correct that bug

With best regards Lorenz Meyer
 [2005-10-19 21:02 UTC] tony2001@php.net
Warning	
Because strip_tags() does not actually validate the HTML, partial, or broken tags can result in the removal of more text/data than expected.
(c) http://php.net/strip_tags
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Dec 22 01:01:30 2024 UTC