PHP :: Request #2028 :: strip_tags state engine inappropriate for single line of html.

Request #2028	strip_tags state engine inappropriate for single line of html.
Submitted:	1999-08-10 23:57 UTC	Modified:	2000-05-30 19:18 UTC
From:	cdi at thewebmasters dot net	Assigned:
Status:	Closed	Package:	Feature/Change Request
PHP Version:	4.0 Beta 2	OS:	RHLinux 5.1 2.0.35
Private report:	No	CVE-ID:	None

View Developer Edit

Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.

Password:

Status:
Package:
Bug Type:
Summary:
From:	cdi at thewebmasters dot net
New email:
PHP Version:		OS:

New Comment:

[1999-08-10 23:57 UTC] cdi at thewebmasters dot net

Demo script:

<?php
Header("Content-type: text/plain");
$data = 'HREF="blah.blah">test</A> inside <A HREF="brackets.com">brackets</A>. What\'s it gonna do?';
$data = strip_tags($data);
echo "$data\n";
?>

Output:

HREF="blah.blah"test inside brackets. What's it gonna do?

Config: ./configure --prefix=/www --with-apache=../apache_1.3.3 --with-mysql --with-imap --with-zlib --with-config-file-path --enable-debug=yes --enable-track-vars=yes --enable-magic-quotes=yes --enable-memory-limit=yes

php.ini not relevant.


When doing "one line at a time" stripping, the state engine simply removes any extraneous > signs.  When I wrote a function similar to this to handle individual lines of html (no multi-line processing), the function set a boolean if and when it sees an < sign. If it sees a > before it ever sees a <, the function logic "assumed" that everything leading up to the > was html and removed it.  Worked like a champ.

Something else, although this is purely asthetic. After a >, and the state engine goes back to zero, it should plunk a "space" into the spot vacated by all the removed html if the next character is not a whitespace character or a less-than sign (<). Otherwise this little test program:

<?php
Header("Content-type: text/plain");
$data = '<TABLE BORDER=0><TR><TD>Hi there</TD></TR><TD>Ooops</TD></TR></TABLE>';
$data = strip_tags($data);
echo "$data\n";
?>

Results in this:

Hi thereOoops

Something like this should fix that (I think)..

case '>':
	if (state == 1) {
		if( *(p+1)!='<' ) {
			if(*(p+1)!=' ')&&(*(p+1)!='	') {
				*(rp++) = ' ';
			}
		}
		lc = '>';
		state = 0;
	} else if (state == 2) {
		if (!br && lc != '\"' && *(p-1)=='?') {
			state = 0;
		}
	}
	break;

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports

[1999-11-14 03:47 UTC] joey at cvs dot php dot net

Moving to change request

[2000-05-30 19:18 UTC] rasmus at cvs dot php dot net

You want to strip incomplete tags because you are doing it on a line-by-line basis and the tag might have been started on a previous line?  Wouldn't it be easier to just concatenate your lines and do the strip_tags() once for the whole thing?  Stripping incomplete tags seems like a bad idea to me and there is no way to ever get it right anyway since a tag that starts on line 1, continues on line 2 and ends on line 3 will be impossible to handle correctly.

	php.net \| support \| documentation \| report a bug \| advanced search \| search howto \| statistics \| random bug \| login
go to bug id or search bugs for


Copyright © 2001-2026 The PHP Group All rights reserved.	Last updated: Mon Jun 15 22:00:02 2026 UTC