php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #45599 [PATCH] strip_tags() truncates rest of string with invalid attribute
Submitted: 2008-07-22 23:37 UTC Modified: 2009-12-22 02:04 UTC
Votes:5
Avg. Score:4.8 ± 0.4
Reproduced:5 of 5 (100.0%)
Same Version:3 (60.0%)
Same OS:4 (80.0%)
From: david at grudl dot com Assigned:
Status: Closed Package: Strings related
PHP Version: 5.*, 6 OS: *
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: david at grudl dot com
New email:
PHP Version: OS:

 

 [2008-07-22 23:37 UTC] david at grudl dot com
Description:
------------
Problematic backslash in HTML attribute (bug exists since PHP 5.2.2)

Reproduce code:
---------------
1) 
echo strip_tags('Hello <a href="any\\"> World');

2) this case is not HTML valid, but who cares...
echo strip_tags('Hello <a href=\"any"> World');

Expected result:
----------------
Hello  World

(in both cases)

Actual result:
--------------
Hello

(in both cases)

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-07-30 04:42 UTC] jet at synth-tec dot com
I am having the same problem.  If an attribute has an extra quote in it, will cut off all the text afterwards.  

Example Input:
----------------
strip_tags('
text before link
<a href="http://google.com"">google.com</a>
text after link
test 1
test 2
')


Expected Output:
-----------------
text before link
text after link
test 1
test 2


Actual Output:
--------------
text before link



Note, I do not have this problem in PHP 5.0.4 or previous versions
 [2008-08-06 16:30 UTC] lbarnaud@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

The parser continues until it founds the end of the tag, which can not be in an attribute value (XML allows all characters except [%&'] in attribute values).

In the given examples the attribute value never terminates and the end of the tag is never found, which causes the rest of the string to be truncated.

This change as been made to fix the following bug: http://bugs.php.net/bug.php?id=40432

 [2008-08-06 16:52 UTC] david at grudl dot com
Character \ is allowed in tag attribute, so strip_tags('Hello <a href="any\"> World') leading to "Hello" (without "World") is bug.
 [2009-08-24 15:53 UTC] hradtke@php.net
PHP 5.x patch:
Index: ext/standard/string.c
===================================================================
--- ext/standard/string.c	(revision 284189)
+++ ext/standard/string.c	(working copy)
@@ -4367,7 +4367,7 @@
 					tp = ((tp-tbuf) >= PHP_TAG_BUF_SIZE ? tbuf: tp);
 					*(tp++) = c;
 				}
-				if (state && p != buf && *(p-1) != '\\' && (!in_q || *p == in_q)) {
+				if (state && p != buf && (state == 1 || *(p-1) != '\\') && (!in_q || *p == in_q)) {
 					if (in_q) {
 						in_q = 0;
 					} else {

Trunk patch:
Index: ext/standard/string.c
===================================================================
--- ext/standard/string.c	(revision 284189)
+++ ext/standard/string.c	(working copy)
@@ -6519,7 +6519,7 @@
 				tp = ((tp-tbuf) >= UBYTES(PHP_TAG_BUF_SIZE) ? tbuf: tp);
 				*(tp++) = ch;
 			}
-			if (state && prev1 != 0x5C /*'\\'*/ && (!in_q || ch == in_q)) {
+			if (state && (state ==1 || prev1 != 0x5C /*'\\'*/) && (!in_q || ch == in_q)) {
 				if (in_q) {
 					in_q = 0;
 				} else {
@@ -6763,7 +6763,7 @@
 					tp = ((tp-tbuf) >= PHP_TAG_BUF_SIZE ? tbuf: tp);
 					*(tp++) = c;
 				}
-				if (state && p != buf && *(p-1) != '\\' && (!in_q || *p == in_q)) {
+				if (state && p != buf && (state ==1 || *(p-1) != '\\') && (!in_q || *p == in_q)) {
 					if (in_q) {
 						in_q = 0;
 					} else {


Test case:
--TEST--
Bug #45599 (strip_tags() ignore backslash (\) character inside html tags)
--FILE--
<?php
echo strip_tags('Hello <a href="any\"> World') . "\n";
echo strip_tags('Hello <a href="any\\"> World') . "\n";
echo strip_tags('Hello <a href=\"any"> World');
?>
--EXPECT--
Hello  World
Hello  World
Hello  World

 [2009-12-22 02:04 UTC] svn@php.net
Automatic comment from SVN on behalf of iliaa
Revision: http://svn.php.net/viewvc/?view=revision&revision=292465
Log: Fixed bug #45599 (strip_tags() truncates rest of string with invalid attribute).
 [2009-12-22 02:04 UTC] iliaa@php.net
This bug has been fixed in SVN.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.


 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed Sep 18 12:01:27 2024 UTC