|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2020-04-24 10:35 UTC] ivo at beerntea dot com
Description:
------------
SimpleXML and DOMDocument (and possibly other libxml functions) do not always free the allocated memory after all references to the XML/DOM object have been released.
This especially is a problem in environments where the same process is used to handle multiple requests, like the FPM interface, as memory allocated by one script won't be released until the FPM process is terminated. When handling large XML files this can easily push a server out of memory.
In addition the XML DOM appeard to use significantly more memory than the original XML string, and the memory used by the XML DOM is not accounted for by PHP (see #63380), allowing to bypass the PHP memory limit.
PHP version: 7.3.17-1+ubuntu18.04.1+deb.sury.org+1
Linux kernel 4.15.0-91-generic x86_64
Test script:
---------------
<?php
function mstat() {
//Print reserved memory according to PHP and total mapped memory according to OS (assumes 4k page size and Linux)
$statm = explode(' ', file_get_contents('/proc/self/statm'));
printf("PHP memory usage=%u MB, process memory usage=%u MB\n", memory_get_usage(TRUE) / 1024 / 1024, $statm[0] * 4096 / 1024 / 1024);
}
mstat();
$xml = '<doc>'.str_repeat('<node attr="attr">test</node>', 1000000).'</doc>';
printf("Preparing big XML: %u MB.\n", strlen($xml) / 1024 / 1024);
mstat();
$flags = 0;
$flags |= LIBXML_COMPACT; //COMPACT flag reduces memory usage a little bit, not much
echo "Parsing DOM document without storing reference...\n";
(new \DOMDocument())->loadXML($xml, $flags);
mstat();
echo "Parse XML DOM without storing reference...\n";
simplexml_load_string($xml, 'SimpleXMLElement', $flags);
mstat();
echo "Parse XML DOM without storing reference...\n";
simplexml_load_string($xml, 'SimpleXMLElement', $flags);
mstat();
echo "Parse XML DOM storing reference...\n";
$dom = simplexml_load_string($xml, 'SimpleXMLElement', $flags);
mstat();
echo "Parse XML DOM overwriting reference...\n";
$dom = simplexml_load_string($xml, 'SimpleXMLElement', $flags);
mstat();
echo "Unset XML string...\n";
$xml = NULL;
mstat();
echo "Unset DOM references...\n";
$dom = $dom2 = $dom3 = NULL;
mstat();
Expected result:
----------------
Indicated memory usage should return back to the initial value after a reference to the XML/DOM object is cleared.
Actual result:
--------------
The memory used by the XML functions remains in use even when there are no more references:
PHP memory usage=2 MB, process memory usage=391 MB
Preparing big XML: 27 MB.
PHP memory usage=29 MB, process memory usage=419 MB
Parsing DOM document without storing reference...
PHP memory usage=29 MB, process memory usage=892 MB
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Sun Nov 02 13:00:01 2025 UTC |
I ran you example through massif, and here's the memory usage over time we see: GB 1.016^ # | :# | :::# | ::::# | @::::# | :@::::# | :::@::::# | :::::@::::#: | ::::::@::::#: | :@:::::@::::#: | @ :: : ::@:::::@::::#:: | :@ :: :: ::::@:::::@::::#:: | ::@ ::: @:: :::::@:::::@::::#:: | @::@ :@::: :::@:: ::::::@:::::@::::#:: | ::@::@: ::@::: ::: @:: :::::::@:::::@::::#:: | ::: @::@: :::@::: : :::: @::@ @@:::::::@:::::@::::#:: | :: : @::@: @::::@::: : @::::: @::@ :@ :::::::@:::::@::::#:: | ::: : @::@: :@::::@::: : :@::::: @::@ ::@ :::::::@:::::@::::#::: | ::::: : @::@:: :@::::@::: : ::@::::: @::@ ::::@ :::::::@:::::@::::#::: |:: ::: : @::@:::::@::::@::: ::::@::::: @::@:: ::@ :::::::@:::::@::::#::: 0 +----------------------------------------------------------------------->Gi 0 13.92 As you can see, the memory usage does drop back down to approximately zero in between processing of different documents. The last peak is twice as high, because you keep two documents in memory at the same time (note that you only reassign $dom *after* you already parsed the new document). Looking at the output of operating-system provided metrics like /proc/self/statm can be misleading if you don't actually know what you're doing. As a rule of thumb, you can assume that memory allocators do not release memory back to the operating system for various reasons, be it performance considerations or memory fragmentation. This memory is not leaked in any conventional sense of the term, in that it will be reused the next time the process allocates more memory. PHP's own allocator does release memory back to the operation system between requests (retaining only enough memory to satisfy the average memory usage of requests), but as has been noted before, for libxml the relevant allocator is the system allocator, and PHP has no control over how it behaves. I don't doubt that there may be memory leak bugs in PHP's use of libxml, but the examples provided here do not show such a bug. When looking for memory leaks, use of massif (or plain valgrind) is generally advisable, as it insulates you from details of the allocator and its interaction with the operating system.