php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #76738 dom extension horrible broken in 7.3 master
Submitted: 2018-08-13 14:41 UTC Modified: 2018-08-17 14:21 UTC
From: spam2 at rhsoft dot net Assigned: cmb (profile)
Status: Closed Package: DOM XML related
PHP Version: 7.3Git-2018-08-13 (Git) OS:
Private report: No CVE-ID: None
 [2018-08-13 14:41 UTC] spam2 at rhsoft dot net
Description:
------------
the script https://access.thelounge.net/harry/php-7.3-dom/dom-bug-script.txt is supposed to clean up WYSIWYG content directly after load it from the database

for years now with php5.4 to 7.2.9 the result is the first version and now you get just a random fragment, also other tests using that inline copied class are broken with random results

https://access.thelounge.net/harry/php-7.3-dom/7.2.9.txt
https://access.thelounge.net/harry/php-7.3-dom/7.3.0.txt


Test script:
---------------
https://access.thelounge.net/harry/php-7.3-dom/dom-bug-script.txt

Expected result:
----------------
https://access.thelounge.net/harry/php-7.3-dom/7.2.9.txt

Actual result:
--------------
https://access.thelounge.net/harry/php-7.3-dom/7.3.0.txt

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-08-13 17:17 UTC] requinix@php.net
-Status: Open +Status: Verified -Assigned To: +Assigned To: ab
 [2018-08-13 17:17 UTC] requinix@php.net
Likely introduced with bug #76285 https://github.com/php/php-src/commit/ef9ed19ec7f141311feea1d42467f5773cfc09bc

Not sure if PHP bug or libxml bug. I kinda suspect the latter since nothing seems obviously wrong on the PHP side.

It's clearly an issue related to buffering; rather than try to explain the behavior, take the repro script and
1. Remove the $test_string= line
2. Add $test_string=<<<FOO (the content from the 7.2.9 output) FOO;
3. Run to get the 7.3.0 output
4. Duplicate one or two of the MUSIK <li>s
5. Run again to get output that's cutoff differently

The cutoff happens at buffered points, like |</li> or li|> or |li> and never l|i>. The first one happens at 4060 bytes of output, which is close to both 4000 (libxml's BASE_BUFFER_SIZE) and 4KB.
 [2018-08-13 17:28 UTC] spam2 at rhsoft dot net
how can it be a libxml problem when it's the identical environment (Fedora 28, gcc-8.2.1-1.fc28.x86_64, libxml2-devel-2.9.8-4.fc28.x86_64) where i take a tarball from https://git.php.net/?p=php-src.git put it into rpmbuild/SOURCES and fire up "rpmbuild -bb php.spec" which as part of the build process runs our test-suite and stops the build if it fails?

it fails not only that one example, there are different dom related tests faling hwich are running for years on several PHP and Fedora versions and it took me enough time to isolate that one which is only an example of a underlying issue
 [2018-08-13 17:37 UTC] requinix@php.net
> how can it be a libxml problem when blah blah blah
Because the change for the bug I cited is the first time PHP has used htmlNodeDumpFormatOutput(). If that's where the bug is then of course it's never been a problem before.

> there are different dom related tests faling
Do any of them have to do with DOMDocument::saveHTML? Because if so then they are relevant here.
 [2018-08-13 18:19 UTC] spam2 at rhsoft dot net
> Do any of them have to do with DOMDocument::saveHTML?

surely and below i try to explain the background

> Because the change for the bug I cited is the first time 
> PHP has used htmlNodeDumpFormatOutput()

so in doubt we need just a option for the dom-extension to keep the old behavior unchanged, $dom->whatever = true/false, in the worst case introduced with PHP 7.3 so that we wrap it in a property_exists() or PHP_VERSION check in the full class linked below

https://access.thelounge.net/harry/php-7.3-dom/global_rh_rte_helper.inc.txt

every HTML content goes through the "magic methods" on_load() at read from database and on_save() before write back the WYSIWYG content which resluts any behavior change here which can't be avoided explicit breaks randomly code running in all previous php versions - in the example fxing a typo by the user would result in complete garbage of the whole cms-page

tinyMCE other than XINHA insists in a block level element and by default wraps a <p></p> around the source which breaks perfect fine content like <strong>test</strong> when the WYSIWYG content is targetet for a template which already contains surrounding HTML

hence we make sure everything is wrapped within <div class="tinymce-generated-root-block" style="margin: 0px; padding: 0px;"></div> at runtime and that this div-layer is removed before write back to the database
 [2018-08-13 21:48 UTC] cmb@php.net
Thanks, Harald and Damian!  Indeed, reverting commit ef9ed19[1]
and its follow-up commit 36f05a8[2] seems to fix the regression.
Unless there'll be a fix within the next 12 hours, I'll revert
these commits from the PHP-7.3 branch before tagging
php-7.3.0beta2.

[1] <http://git.php.net/?p=php-src.git;a=commit;h=ef9ed19>
[2] <http://git.php.net/?p=php-src.git;a=commit;h=36f05a8>
 [2018-08-14 05:56 UTC] requinix@php.net
I've been digging into this for hours and I keep coming back to libxml being at fault. Or perhaps PHP using the wrong API? Specifically, htmlNodeDumpFormatOutput is being given the right node to dump, the outer buffer wrapper thing shows 4300+ bytes written, but the inner actual buffer shows only 300+ bytes written with the content starting at that </li>. I suspect the inner buffer either didn't expand properly, or there are two 4000-byte buffers in series and the one being examined is the latter.
Unfortunately significant parts of the buffer system are private so I don't know what's going wrong. More confusing is that SimpleXML seems to be using the same function, and after fixing the markup to be XHTML, asXML() works fine.

So I don't know.
 [2018-08-14 10:19 UTC] cmb@php.net
I have reverted the respective commits from PHP-7.3, but not
master, for now[1].  Maybe we can find a correct solution for
PHP-7.3.

[1] <http://git.php.net/?p=php-src.git;a=commit;h=819cf52>
 [2018-08-14 10:20 UTC] ab@php.net
Christoph, fine to revert then. It's obviously too short to fix before the next beta.

Thanks.
 [2018-08-14 12:21 UTC] ab@php.net
-Status: Verified +Status: Feedback
 [2018-08-14 12:21 UTC] ab@php.net
I've just pushed a fix into master and could use a hand with checking it. Also, is there a shorter reproducer perhaps?

Thanks.
 [2018-08-14 13:43 UTC] cmb@php.net
Thanks, Anatol.  Commit 083285f[1] fixes the issue for me.

[1] <http://git.php.net/?p=php-src.git;a=commit;h=083285f>
 [2018-08-14 15:21 UTC] spam2 at rhsoft dot net
congratulations!

both currnt 7.3-HEAD as well as MASTER buliding fine again and our complete test-suite and PGP-profiling runs without a single warning, run-test skippedbut two rpm-builds with our own testing fine again

https://git.php.net/?p=php-src.git
2 hours ago 	 Anatol Belski	Cleanup master
3 hours ago 	 Christoph M... [ci skip] Update NEWS wrt. php-7.3.0beta2 tagging PHP-7.3

so from my point of view the fix for MASTER works just fine and as far as i can see now you could go with this one instead revert for PHP7.3
 [2018-08-14 19:29 UTC] ab@php.net
Thanks for the triage and checks, guys!

Christoph, if you decide to pick this into 7.3, please take 8b3174f256147a1708821621a8cbe2b257fca737 as well. Otherwise the NEWS entry for bug #76285 can be just moved to master. 

Thanks.
 [2018-08-15 21:05 UTC] cmb@php.net
-Status: Feedback +Status: Open -Assigned To: ab +Assigned To: cmb
 [2018-08-15 21:05 UTC] cmb@php.net
Yeah, the fix including the follow-ups should go into PHP 7.3.
 [2018-08-17 11:10 UTC] cmb@php.net
-Status: Assigned +Status: Closed
 [2018-08-17 11:10 UTC] cmb@php.net
The four commits have been cherry-picked into PHP-7.3.
 [2018-08-17 14:21 UTC] ab@php.net
Thanks for merging, Chistoph. I'm just coming to it for more check, also working on a test.

Thanks.
 
PHP Copyright © 2001-2019 The PHP Group
All rights reserved.
Last updated: Sun Dec 15 23:01:27 2019 UTC