php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #63380 Allocation via libxml does not use PHP's per-request allocator
Submitted: 2012-10-29 03:25 UTC Modified: 2018-03-14 14:28 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:1 (100.0%)
From: tstarling@php.net Assigned:
Status: Open Package: *XML functions
PHP Version: master-Git-2012-10-29 (Git) OS: Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: tstarling@php.net
New email:
PHP Version: OS:

 

 [2012-10-29 03:25 UTC] tstarling@php.net
Description:
------------
Allocation via libxml does not use PHP's per-request allocator. So any memory used by libxml will not be accounted against memory_get_usage() or memory_limit.

At Wikimedia we use libxml DOM trees to store wikitext parse trees, because they are more compact in memory than the equivalent pure-PHP data structures. When these parse trees are cached, the memory requirements can become excessive, and the memory is typically not returned to the system after request termination. Using xmlMemSetup() to set hook functions which use PHP's per-request allocation functions will allow us to more effectively monitor and limit the use of libxml in production.

I've developed a patch and will submit it to GitHub as a pull request.

Test script:
---------------
$doc = new DOMDocument;
for ( $i = 0; $i < 1000000 ; $i++ ) {
    $doc->appendChild($doc->createElement('foo'));
}
print memory_get_usage()."\n";


Expected result:
----------------
312331440 (with debug and ZTS)

Actual result:
--------------
694256

Patches

Add a Patch

Pull Requests

Pull requests:

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2012-10-29 15:22 UTC] felipe@php.net
-Status: Open +Status: Assigned -Assigned To: +Assigned To: rrichards
 [2012-12-05 11:08 UTC] rrichards@php.net
-Assigned To: rrichards +Assigned To: tstarling
 [2012-12-05 11:08 UTC] rrichards@php.net
There is a major problem with doing this and why I didn't end tying into the PHP 
memory allocator. Depending upon setup, it is extremely likely to be able to hit 
memory corruption and/or mix memory allocations between modules. i.e. using 
mod_perl and mod_php will cause PHP to override the libxml memory handling 
functions (which are global) and bleed into mod_perl (or any others that are 
using libxml2) causing any number of results (crashes, security issues, etc..). 
The only way to be able to do something like this would be to make it compile 
time option which is disabled by default allowing those who know their 
environment intimately can utilize this at their own risk, Don't know if you 
want to write a patch for that or not. Otherwise I don't see any way this could 
safely be added,
 [2012-12-05 11:41 UTC] tstarling@php.net
Do you know of a specific case where request-local allocations could bleed into mod_perl and cause memory corruption?

I have reviewed all of libxml's global variables and ensured that they cannot be made to hold pointers to request-local allocations. The hooks are disabled at post-deactivate via a thread-local variable, so a perl request will not get request-local pointers from xmlMalloc() whether it runs in its own thread concurrently, or in the same thread as PHP but at a later time. TSRM_FETCH() should give default global variables, with local request allocation disabled, even if it is called from a thread where PHP has never been used.

Either way, I would be happy to make this configurable, off by default, since the robustness of the solution depends on details of global pointer storage in libxml which may change in the future. So my patch does introduce a maintenance burden, with a risk of dangling pointers if that maintenance is not kept up to date. I'm just not keen on having the documentation say that there are known issues with interaction with other Apache modules unless that is truly the case.
 [2012-12-05 11:49 UTC] pajoye@php.net
That is a very well known issues when mixed builds free and alloc same memory 
areas.

It is (almost) no problem when using libc allocation system but it is a (huge) 
problem when php uses PHP memory manager and other libc or their own mm. It leads 
to crashes, almost always.
 [2013-11-08 06:09 UTC] tstarling@php.net
-Status: Assigned +Status: Open -Assigned To: tstarling +Assigned To:
 [2013-11-08 06:09 UTC] tstarling@php.net
Unassigning from me since I'm not really interested in pursuing this. I have tried to explain the risks involved, and the precautions I have taken, but the comments I have received, here and on the pull request, indicate that I have not been successful. The feature will require maintenance, and obviously if nobody understands what the patch is doing and why, then nobody will be able to maintain it.
 [2018-03-14 14:28 UTC] cmb@php.net
-Package: XML related +Package: *XML functions
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Mar 28 21:01:27 2024 UTC