PHP :: Request #2028 :: strip_tags state engine inappropriate for single line of html.

Request #2028	strip_tags state engine inappropriate for single line of html.
Submitted:	1999-08-10 23:57 UTC	Modified:	2000-05-30 19:18 UTC
From:	cdi at thewebmasters dot net	Assigned:
Status:	Closed	Package:	Feature/Change Request
PHP Version:	4.0 Beta 2	OS:	RHLinux 5.1 2.0.35
Private report:	No	CVE-ID:	None

View Developer Edit

[1999-08-10 23:57 UTC] cdi at thewebmasters dot net

Demo script:

<?php
Header("Content-type: text/plain");
$data = 'HREF="blah.blah">test</A> inside <A HREF="brackets.com">brackets</A>. What\'s it gonna do?';
$data = strip_tags($data);
echo "$data\n";
?>

Output:

HREF="blah.blah"test inside brackets. What's it gonna do?

Config: ./configure --prefix=/www --with-apache=../apache_1.3.3 --with-mysql --with-imap --with-zlib --with-config-file-path --enable-debug=yes --enable-track-vars=yes --enable-magic-quotes=yes --enable-memory-limit=yes

php.ini not relevant.


When doing "one line at a time" stripping, the state engine simply removes any extraneous > signs.  When I wrote a function similar to this to handle individual lines of html (no multi-line processing), the function set a boolean if and when it sees an < sign. If it sees a > before it ever sees a <, the function logic "assumed" that everything leading up to the > was html and removed it.  Worked like a champ.

Something else, although this is purely asthetic. After a >, and the state engine goes back to zero, it should plunk a "space" into the spot vacated by all the removed html if the next character is not a whitespace character or a less-than sign (<). Otherwise this little test program:

<?php
Header("Content-type: text/plain");
$data = '<TABLE BORDER=0><TR><TD>Hi there</TD></TR><TD>Ooops</TD></TR></TABLE>';
$data = strip_tags($data);
echo "$data\n";
?>

Results in this:

Hi thereOoops

Something like this should fix that (I think)..

case '>':
	if (state == 1) {
		if( *(p+1)!='<' ) {
			if(*(p+1)!=' ')&&(*(p+1)!='	') {
				*(rp++) = ' ';
			}
		}
		lc = '>';
		state = 0;
	} else if (state == 2) {
		if (!br && lc != '\"' && *(p-1)=='?') {
			state = 0;
		}
	}
	break;

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports

[1999-11-14 03:47 UTC] joey at cvs dot php dot net

Moving to change request

[2000-05-30 19:18 UTC] rasmus at cvs dot php dot net

You want to strip incomplete tags because you are doing it on a line-by-line basis and the tag might have been started on a previous line?  Wouldn't it be easier to just concatenate your lines and do the strip_tags() once for the whole thing?  Stripping incomplete tags seems like a bad idea to me and there is no way to ever get it right anyway since a tag that starts on line 1, continues on line 2 and ends on line 3 will be impossible to handle correctly.

	php.net \| support \| documentation \| report a bug \| advanced search \| search howto \| statistics \| random bug \| login
go to bug id or search bugs for


Copyright © 2001-2026 The PHP Group All rights reserved.	Last updated: Mon Jun 15 20:00:02 2026 UTC