php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #63417 html tags between <script> lost while parsed by dom
Submitted: 2012-11-02 07:27 UTC Modified: 2012-11-02 08:00 UTC
From: icewavestone at gmail dot com Assigned:
Status: Not a bug Package: DOM XML related
PHP Version: 5.3.18 OS: all
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: icewavestone at gmail dot com
New email:
PHP Version: OS:

 

 [2012-11-02 07:27 UTC] icewavestone at gmail dot com
Description:
------------
使用DOMDocument对一段字符串进行标签闭合处理时,如果字符串包含有script标签,在<script></script>标签里的其他的标签会在进行标签闭合处理时,会出现丢失。如字符串:<b>aaaaa<div           ID="te<te"st>st">&nbsp;startstartstartenhao好开始测<b>试编辑器\n提交的数据<br>呵\n呵<br>哈哈<br>嗯endendendend\n\n<br></div><script><span>111</span>alert("<span>你好!</span>")</script>。
其中<script><span>111</span>alert("<span>你好!</span>")</script>这段字符串会在经过:
$dom = new DOMDocument();

$dom->loadHTML('<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body>' . $html . '</body></html>');

$html = $dom->saveHTML();
这些函数处理后,变成:
<script><span>111alert("<span>你好!")</script>
不知为什么会是这样的结果


Test script:
---------------
完整的测试代码段为:

$html = '<b>aaaaa<div           ID="te<te"st>st">&nbsp;startstartstartenhao好开始测<b>试编辑器\n提交的数据<br>呵\n呵<br>哈哈<br>嗯endendendend\n\n<br></div><script><span>111</span>alert("<span>你好!</span>")</script>';


libxml_clear_errors();

$libxml_use_errors = libxml_use_internal_errors(true);

$dom = new DOMDocument();

$dom->loadHTML('<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body>' . $html . '</body></html>');

$html = $dom->saveHTML();

$body_start = strpos($html, '<body>');
$body_end = strrpos($html, '</body>');
$new_string = str_replace("\n", '', substr($html, $body_start + 6, $body_end - $body_start - 6));

libxml_use_internal_errors($libxml_use_errors);

echo $html . "\n" . $new_string . "\n";


Expected result:
----------------
测试字符串:
<script><span>111</span>alert("<span>你好!</span>")</script>

经过:$dom = new DOMDocument();

$dom->loadHTML('<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body>' . $html . '</body></html>');

$html = $dom->saveHTML();处理后,

希望还应该是:
<script><span>111</span>alert("<span>你好!</span>")</script>


Actual result:
--------------
如字符串:
<script><span>111</span>alert("<span>你好!</span>")</script>

现在的返回结果是:
<script><span>111alert("<span>你好!")</script>

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2012-11-02 08:00 UTC] laruence@php.net
-Summary: dom处理标签script闭合时会将script标签里的数据丢失一部分 +Summary: html tags between <script> lost while parsed by dom -Status: Open +Status: Not a bug
 [2012-11-02 08:00 UTC] laruence@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

你使用了libxml_use_internal_errors静默了错误提示.
实际上PHP已经告诉了你:
PHP Warning:  DOMDocument::loadHTML(): Unexpected end tag : span in Entity

//
I'd like to mark this as a Nab, since PHP will warning you about the abnormal 
html 
like:
PHP Warning:  DOMDocument::loadHTML(): Unexpected end tag : span in Entity

but you used libxml_use_internal_errors to suppress it.

thanks
 [2012-11-05 09:47 UTC] icewavestone at gmail dot com
好吧,这不是php的BUG,是libxml的。
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Fri Mar 14 03:01:32 2025 UTC