php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #63417 html tags between <script> lost while parsed by dom
Submitted: 2012-11-02 07:27 UTC Modified: 2012-11-02 08:00 UTC
From: icewavestone at gmail dot com Assigned:
Status: Not a bug Package: DOM XML related
PHP Version: 5.3.18 OS: all
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: icewavestone at gmail dot com
New email:
PHP Version: OS:

 

 [2012-11-02 07:27 UTC] icewavestone at gmail dot com
Description:
------------
使用DOMDocument对一段字符串进行标签闭合处理时,如果字符串包含有script标签,在<script></script>标签里的其他的标签会在进行标签闭合处理时,会出现丢失。如字符串:<b>aaaaa<div           ID="te<te"st>st">&nbsp;startstartstartenhao好开始测<b>试编辑器\n提交的数据<br>呵\n呵<br>哈哈<br>嗯endendendend\n\n<br></div><script><span>111</span>alert("<span>你好!</span>")</script>。
其中<script><span>111</span>alert("<span>你好!</span>")</script>这段字符串会在经过:
$dom = new DOMDocument();

$dom->loadHTML('<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body>' . $html . '</body></html>');

$html = $dom->saveHTML();
这些函数处理后,变成:
<script><span>111alert("<span>你好!")</script>
不知为什么会是这样的结果


Test script:
---------------
完整的测试代码段为:

$html = '<b>aaaaa<div           ID="te<te"st>st">&nbsp;startstartstartenhao好开始测<b>试编辑器\n提交的数据<br>呵\n呵<br>哈哈<br>嗯endendendend\n\n<br></div><script><span>111</span>alert("<span>你好!</span>")</script>';


libxml_clear_errors();

$libxml_use_errors = libxml_use_internal_errors(true);

$dom = new DOMDocument();

$dom->loadHTML('<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body>' . $html . '</body></html>');

$html = $dom->saveHTML();

$body_start = strpos($html, '<body>');
$body_end = strrpos($html, '</body>');
$new_string = str_replace("\n", '', substr($html, $body_start + 6, $body_end - $body_start - 6));

libxml_use_internal_errors($libxml_use_errors);

echo $html . "\n" . $new_string . "\n";


Expected result:
----------------
测试字符串:
<script><span>111</span>alert("<span>你好!</span>")</script>

经过:$dom = new DOMDocument();

$dom->loadHTML('<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body>' . $html . '</body></html>');

$html = $dom->saveHTML();处理后,

希望还应该是:
<script><span>111</span>alert("<span>你好!</span>")</script>


Actual result:
--------------
如字符串:
<script><span>111</span>alert("<span>你好!</span>")</script>

现在的返回结果是:
<script><span>111alert("<span>你好!")</script>

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2012-11-02 08:00 UTC] laruence@php.net
-Summary: dom处理标签script闭合时会将script标签里的数据丢失一部分 +Summary: html tags between <script> lost while parsed by dom -Status: Open +Status: Not a bug
 [2012-11-02 08:00 UTC] laruence@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

你使用了libxml_use_internal_errors静默了错误提示.
实际上PHP已经告诉了你:
PHP Warning:  DOMDocument::loadHTML(): Unexpected end tag : span in Entity

//
I'd like to mark this as a Nab, since PHP will warning you about the abnormal 
html 
like:
PHP Warning:  DOMDocument::loadHTML(): Unexpected end tag : span in Entity

but you used libxml_use_internal_errors to suppress it.

thanks
 [2012-11-05 09:47 UTC] icewavestone at gmail dot com
好吧,这不是php的BUG,是libxml的。
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Apr 18 12:01:28 2024 UTC