php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #62365 Incorrect injection of html markup
Submitted: 2012-06-19 20:02 UTC Modified: 2013-07-31 05:36 UTC
From: d dot voice at gmx dot com Assigned:
Status: Wont fix Package: Tidy (PECL)
PHP Version: 5.3.14 OS: RH Linux
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2012-06-19 20:02 UTC] d dot voice at gmx dot com
Description:
------------
Tidy seems to injecting markup to unexpected places. As can be seen from the below code. There's a place where I have an orphan closing </i> tag within an unclosed paragraph tag.

  <p>paragraph</i><p/>

It's injecting the following markup within the paragraph tags:
<span style="padding:10px">

Tidy package info:
Tidy support => enabled
libTidy Release => 14 June 2007
Extension Version => 2.0 ($Id: tidy.c,v 1.66.2.8.2.26 2008/12/31 11:17:46 sebastian Exp $)

Short code snippet to reproduce the bug - please see below.

List of modules:
--with-apxs2=/usr/local/apache2/bin/apxs' '--with-zlib=/usr' '--with-pspell=/usr' '--prefix=/usr' '--with-config-file-path=/etc' '--libexecdir=/usr/libexec' '--with-curl' '--enable-memory-limit' '--with-exec-dir=/usr/bin' '--with-freetype-dir=/usr' '--with-iconv' '--with-expat-dir=/usr' '--enable-magic-quotes' '--enable-track-vars' '--enable-dio' '--without-sqlite' '--with-xml2' '--with-xmlrpc' '--enable-pcntl' '--disable-debug' '--enable-inline-optimization' '--enable-mbstring' '--enable-mm=shared' '--enable-safe-mode' '--enable-trans-sid' '--enable-wddx=shared' '--enable-xml' '--with-regex=system' '--with-xsl' '--with-tidy=/usr

Test script:
---------------
<?php
$html = '
<html>
  <head>
    <title>test</title>
  </head>
  <body>
&amp;
<div
Hello world again
<div>
<span style="padding:10px"> There are 12 columns in the grid
Hello world<br>
</div>
    <p>paragraph</i><p/>
  </body>
</html>';

// Specify configuration
$config = array(
                'indent'         => true,
                'output-xhtml'   => true,
                'wrap'           => 200,
                'clean'          => true,
                'bare'          => true,
                'preserve-entities'          => true
                );

$tidy = new tidy();

$clean = $tidy->repairString($html, $config, 'utf8');

// Output
echo $clean;

?>


Expected result:
----------------
Expected result:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>
      test
    </title>
  </head>
  <body>
    &amp;
    <div hello="" world="" again=""></div>
    <div>
      <span style="padding:10px">There are 12 columns in the grid Hello world<br /></span>
    </div>
    <p>
      paragraph
    </p>
  </body>
</html>

However in an dream world where computers know what I want - 
this statement would also be - <div hello="" world="" again=""></div>
changed to this:     
<div> 
   hello world again
</div>



Actual result:
--------------
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>
      test
    </title>
  </head>
  <body>
    &amp;
    <div hello="" world="" again=""></div>
    <div>
      <span style="padding:10px">There are 12 columns in the grid Hello world<br /></span>
    </div>
    <p>
      <span style="padding:10px">paragraph</span>
    </p>
  </body>
</html>


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2013-07-30 19:25 UTC] mike@php.net
-Package: Output Control +Package: Tidy
 [2013-07-31 05:36 UTC] yohgaki@php.net
-Status: Open +Status: Wont fix
 [2013-07-31 05:36 UTC] yohgaki@php.net
This is tidy's bug.
There is nothing much we can do.
Please report tidy developers.

[yohgaki@dev PHP-5.4]$ rpm -q libtidy
libtidy-0.99.0-28.20091203.fc19.x86_64

[yohgaki@dev PHP-5.4]$ tidy tt.html
line 1 column 1 - Warning: missing <!DOCTYPE> declaration
line 7 column 1 - Warning: <div> missing '>' for end of tag
line 10 column 1 - Warning: missing </span> before </div>
line 13 column 8 - Warning: inserting implicit <span>
line 13 column 8 - Warning: replacing unexpected i by </i>
line 7 column 1 - Warning: missing </div>
line 7 column 1 - Warning: <div> proprietary attribute "hello"
line 7 column 1 - Warning: <div> proprietary attribute "world"
line 7 column 1 - Warning: <div> proprietary attribute "again"
line 13 column 21 - Warning: trimming empty <p>
Info: Document content looks like HTML 4.01 Transitional
Info: No system identifier in emitted doctype
10 warnings, 0 errors were found!

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta name="generator" content=
"HTML Tidy for Linux (vers 25 March 2009), see www.w3.org">
<title>test</title>
</head>
<body>
&amp;
<div hello="" world="" again="">
<div><span style="padding:10px">There are 12 columns in the grid
Hello world<br></span></div>
<p><span style="padding:10px">paragraph</span></p>
</div>
</body>
</html>

To learn more about HTML Tidy see http://tidy.sourceforge.net
Please fill bug reports and queries using the "tracker" on the Tidy web site.
Additionally, questions can be sent to html-tidy@w3.org
HTML and CSS specifications are available from http://www.w3.org/
Lobby your company to join W3C, see http://www.w3.org/Consortium
 
PHP Copyright © 2001-2020 The PHP Group
All rights reserved.
Last updated: Thu Dec 03 01:01:23 2020 UTC