php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #79518 SimpleXML memory leak
Submitted: 2020-04-24 10:35 UTC Modified: 2020-11-17 14:56 UTC
Votes:2
Avg. Score:4.5 ± 0.5
Reproduced:1 of 2 (50.0%)
Same Version:0 (0.0%)
Same OS:1 (100.0%)
From: ivo at beerntea dot com Assigned:
Status: Open Package: SimpleXML related
PHP Version: 7.3.17 OS: Linux
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: ivo at beerntea dot com
New email:
PHP Version: OS:

 

 [2020-04-24 10:35 UTC] ivo at beerntea dot com
Description:
------------
SimpleXML and DOMDocument (and possibly other libxml functions) do not always free the allocated memory after all references to the XML/DOM object have been released.

This especially is a problem in environments where the same process is used to handle multiple requests, like the FPM interface, as memory allocated by one script won't be released until the FPM process is terminated. When handling large XML files this can easily push a server out of memory.

In addition the XML DOM appeard to use significantly more memory than the original XML string, and the memory used by the XML DOM is not accounted for by PHP (see #63380), allowing to bypass the PHP memory limit.

PHP version: 7.3.17-1+ubuntu18.04.1+deb.sury.org+1
Linux kernel 4.15.0-91-generic x86_64

Test script:
---------------
<?php
function mstat() {
  //Print reserved memory according to PHP and total mapped memory according to OS (assumes 4k page size and Linux) 
  $statm = explode(' ', file_get_contents('/proc/self/statm'));
  printf("PHP memory usage=%u MB, process memory usage=%u MB\n", memory_get_usage(TRUE) / 1024 / 1024, $statm[0] * 4096 / 1024 / 1024);
}

mstat();

$xml = '<doc>'.str_repeat('<node attr="attr">test</node>', 1000000).'</doc>';
printf("Preparing big XML: %u MB.\n", strlen($xml) / 1024 / 1024);
mstat();

$flags = 0;
$flags |= LIBXML_COMPACT; //COMPACT flag reduces memory usage a little bit, not much

echo "Parsing DOM document without storing reference...\n";
(new \DOMDocument())->loadXML($xml, $flags);
mstat();

echo "Parse XML DOM without storing reference...\n";
simplexml_load_string($xml, 'SimpleXMLElement', $flags);  
mstat();

echo "Parse XML DOM without storing reference...\n";
simplexml_load_string($xml, 'SimpleXMLElement', $flags);  
mstat();

echo "Parse XML DOM storing reference...\n";
$dom = simplexml_load_string($xml, 'SimpleXMLElement', $flags);  
mstat();

echo "Parse XML DOM overwriting reference...\n";
$dom = simplexml_load_string($xml, 'SimpleXMLElement', $flags);  
mstat();

echo "Unset XML string...\n";
$xml = NULL;
mstat();

echo "Unset DOM references...\n";
$dom = $dom2 = $dom3 = NULL;
mstat();

Expected result:
----------------
Indicated memory usage should return back to the initial value after a reference to the XML/DOM object is cleared.

Actual result:
--------------
The memory used by the XML functions remains in use even when there are no more references:

PHP memory usage=2 MB, process memory usage=391 MB
Preparing big XML: 27 MB.
PHP memory usage=29 MB, process memory usage=419 MB
Parsing DOM document without storing reference...
PHP memory usage=29 MB, process memory usage=892 MB


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-11-17 12:44 UTC] cmb@php.net
-Status: Open +Status: Feedback -Assigned To: +Assigned To: cmb
 [2020-11-17 12:44 UTC] cmb@php.net
> In addition the XML DOM appeard to use significantly more memory
> than the original XML string, […]

What is not unexpected, given that the DOM is stored as a tree
structure, with a lot of info for each node.  That would be a
libxml issue, though.

> , and the memory used by the XML DOM is not accounted for by PHP
> (see #63380), allowing to bypass the PHP memory limit.

Besides that already tracked in the other bug report, it is not
likely something we can fix without further support from libxml.

> The memory used by the XML functions remains in use even when
> there are no more references:

Are you sure?  This may be just the ZendMM optimization which does
not immediately give all possible memory back to the OS, to avoid
unnecessary re-allocations for the next request.  To check that,
you can run the script(s) with USE_ZEND_ALLOC=0.
 [2020-11-17 14:44 UTC] ivo at beerntea dot com
-Status: Feedback +Status: Assigned
 [2020-11-17 14:44 UTC] ivo at beerntea dot com
I realize and agree that the memory requirements and external allocation by libxml is a separate problem. I just mentioned it because it contributes to this supposed bug causing a real problem.

I have tested the script with USE_ZEND_ALLOC=0, this did not make a significant difference. Possibly because the largest memory allocation is done by libxml and is not known to Zend.

I am not entirely sure what the allocated memory is used for. It appears that most of it is reused on repeated calls, but is never released.

In a production application we noticed that the order of allocations can make a difference, if we would load a large XML file, process it and then unset it, all memory would be released. If we loaded another small XML document and allocated some memory in PHP while the large tree was still in memory, the memory used by the large tree was never released even though there were no references to XML nodes. I can not yet reproduce this in the test script as it just seems to keep the memory at all times.

In case of the production application, the memory remained in use by the PHP FPM process that ran the script, even after the script completed. Repeated requests for this script resulted in multiple processes with a claim on a lot of memory and eventually an out of memory situation on the server.
 [2020-11-17 14:56 UTC] cmb@php.net
Thanks for further feedback!  There might indeed be a memory leak,
but it should be possible to detect that with valgrind or
LeakSanitizer or such.

For further insights on libxml memory management, see
<http://xmlsoft.org/xmlmem.html>.
 [2020-11-17 14:56 UTC] cmb@php.net
-Assigned To: cmb +Assigned To:
 
PHP Copyright © 2001-2021 The PHP Group
All rights reserved.
Last updated: Sat Jan 23 12:01:24 2021 UTC