php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #62365 Incorrect injection of html markup
Submitted: 2012-06-19 20:02 UTC Modified: 2013-07-31 05:36 UTC
From: d dot voice at gmx dot com Assigned:
Status: Wont fix Package: Tidy (PECL)
PHP Version: 5.3.14 OS: RH Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: d dot voice at gmx dot com
New email:
PHP Version: OS:

 

 [2012-06-19 20:02 UTC] d dot voice at gmx dot com
Description:
------------
Tidy seems to injecting markup to unexpected places. As can be seen from the below code. There's a place where I have an orphan closing </i> tag within an unclosed paragraph tag.

  <p>paragraph</i><p/>

It's injecting the following markup within the paragraph tags:
<span style="padding:10px">

Tidy package info:
Tidy support => enabled
libTidy Release => 14 June 2007
Extension Version => 2.0 ($Id: tidy.c,v 1.66.2.8.2.26 2008/12/31 11:17:46 sebastian Exp $)

Short code snippet to reproduce the bug - please see below.

List of modules:
--with-apxs2=/usr/local/apache2/bin/apxs' '--with-zlib=/usr' '--with-pspell=/usr' '--prefix=/usr' '--with-config-file-path=/etc' '--libexecdir=/usr/libexec' '--with-curl' '--enable-memory-limit' '--with-exec-dir=/usr/bin' '--with-freetype-dir=/usr' '--with-iconv' '--with-expat-dir=/usr' '--enable-magic-quotes' '--enable-track-vars' '--enable-dio' '--without-sqlite' '--with-xml2' '--with-xmlrpc' '--enable-pcntl' '--disable-debug' '--enable-inline-optimization' '--enable-mbstring' '--enable-mm=shared' '--enable-safe-mode' '--enable-trans-sid' '--enable-wddx=shared' '--enable-xml' '--with-regex=system' '--with-xsl' '--with-tidy=/usr

Test script:
---------------
<?php
$html = '
<html>
  <head>
    <title>test</title>
  </head>
  <body>
&amp;
<div
Hello world again
<div>
<span style="padding:10px"> There are 12 columns in the grid
Hello world<br>
</div>
    <p>paragraph</i><p/>
  </body>
</html>';

// Specify configuration
$config = array(
                'indent'         => true,
                'output-xhtml'   => true,
                'wrap'           => 200,
                'clean'          => true,
                'bare'          => true,
                'preserve-entities'          => true
                );

$tidy = new tidy();

$clean = $tidy->repairString($html, $config, 'utf8');

// Output
echo $clean;

?>


Expected result:
----------------
Expected result:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>
      test
    </title>
  </head>
  <body>
    &amp;
    <div hello="" world="" again=""></div>
    <div>
      <span style="padding:10px">There are 12 columns in the grid Hello world<br /></span>
    </div>
    <p>
      paragraph
    </p>
  </body>
</html>

However in an dream world where computers know what I want - 
this statement would also be - <div hello="" world="" again=""></div>
changed to this:     
<div> 
   hello world again
</div>



Actual result:
--------------
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>
      test
    </title>
  </head>
  <body>
    &amp;
    <div hello="" world="" again=""></div>
    <div>
      <span style="padding:10px">There are 12 columns in the grid Hello world<br /></span>
    </div>
    <p>
      <span style="padding:10px">paragraph</span>
    </p>
  </body>
</html>


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2013-07-30 19:25 UTC] mike@php.net
-Package: Output Control +Package: Tidy
 [2013-07-31 05:36 UTC] yohgaki@php.net
-Status: Open +Status: Wont fix
 [2013-07-31 05:36 UTC] yohgaki@php.net
This is tidy's bug.
There is nothing much we can do.
Please report tidy developers.

[yohgaki@dev PHP-5.4]$ rpm -q libtidy
libtidy-0.99.0-28.20091203.fc19.x86_64

[yohgaki@dev PHP-5.4]$ tidy tt.html
line 1 column 1 - Warning: missing <!DOCTYPE> declaration
line 7 column 1 - Warning: <div> missing '>' for end of tag
line 10 column 1 - Warning: missing </span> before </div>
line 13 column 8 - Warning: inserting implicit <span>
line 13 column 8 - Warning: replacing unexpected i by </i>
line 7 column 1 - Warning: missing </div>
line 7 column 1 - Warning: <div> proprietary attribute "hello"
line 7 column 1 - Warning: <div> proprietary attribute "world"
line 7 column 1 - Warning: <div> proprietary attribute "again"
line 13 column 21 - Warning: trimming empty <p>
Info: Document content looks like HTML 4.01 Transitional
Info: No system identifier in emitted doctype
10 warnings, 0 errors were found!

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta name="generator" content=
"HTML Tidy for Linux (vers 25 March 2009), see www.w3.org">
<title>test</title>
</head>
<body>
&amp;
<div hello="" world="" again="">
<div><span style="padding:10px">There are 12 columns in the grid
Hello world<br></span></div>
<p><span style="padding:10px">paragraph</span></p>
</div>
</body>
</html>

To learn more about HTML Tidy see http://tidy.sourceforge.net
Please fill bug reports and queries using the "tracker" on the Tidy web site.
Additionally, questions can be sent to html-tidy@w3.org
HTML and CSS specifications are available from http://www.w3.org/
Lobby your company to join W3C, see http://www.w3.org/Consortium
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Tue May 13 12:01:27 2025 UTC