php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #80158 Memory leak when xml nodes are stored into local array
Submitted: 2020-09-27 21:34 UTC Modified: 2020-10-18 04:22 UTC
From: efbiaiinzinz at hotmail dot com Assigned:
Status: No Feedback Package: SimpleXML related
PHP Version: 7.4.10 OS: Linux
Private report: No CVE-ID: None
 [2020-09-27 21:34 UTC] efbiaiinzinz at hotmail dot com
Description:
------------
When looping over xml nodes and storing the node objects temporarily into a local array, some memory is never released and doing it many times in a row with different files will result in out of memory error.

Provided sample shows that ~516096 bytes is lost.
When I change the test script line
		$items[(int)(string)$item->id] = $item;
to
		$items[(int)(string)$item->id] = (string)$item->name;
or even to
		$items[(int)(string)$item->id] = $item->asXML();

Then memory amount before and after testFile() call is exactly the same.

Test script:
---------------
function generate($counter) {
	$file = tempnam(sys_get_temp_dir(), 'xml');
	$f = fopen($file, 'wb');
	fwrite($f, '<?xml version="1.0" encoding="UTF-8"?><items>' . "\n");
	for ($i = 1; $i <= $counter; $i++)
		fwrite($f, '<item id="' . $i . '"><id>' . $i . '</id><name>item ' . $i . '</name></item>' . "\n");
	fwrite($f, '</items>');
	fclose($f);
	return $file;
}
function testFile($file) {
	$xml = simplexml_load_file($file);
	$items = [];
	foreach ($xml->item as $item)
		$items[(int)(string)$item->id] = $item;
}
$file = generate(50000);
$mem1 = memory_get_usage();
testFile($file);
$mem2 = memory_get_usage();
echo "$mem1\n$mem2\n";
unlink($file);

Expected result:
----------------
The test should print two exactly same numbers, separated by linebreak.

Actual result:
--------------
Second number is ~516 000 larger which means at least 0.5MB of memory leaked somehow.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-10-01 14:50 UTC] nikic@php.net
-Status: Open +Status: Feedback
 [2020-10-01 14:50 UTC] nikic@php.net
> When looping over xml nodes and storing the node objects temporarily into a local array, some memory is never released and doing it many times in a row with different files will result in out of memory error.

From a cursory look, this looks like an object store "leak", which means that memory usage is proportional to the maximum number of objects live at the same time. This is generally harmless as it does not result in a persistent leak and the memory is reused when creating new objects.

However, it also means that "doing it many times in a row with different files will result in out of memory error" does not hold, assuming you don't keep objects alive between files.
 [2020-10-02 12:05 UTC] efbiaiinzinz at hotmail dot com
I'll try to make a more complex sample, but something strange is happening still
.
The full system where it started occurring does lots of reading from different datapoints (some xml, some json, some csv).

There is a common Base class extended by different per-endpoint implementations, that map the specific structure into common internal format and then pass it on to additional handlers that store changes into db.

I have added logging and the handlers that use bigger CSV files (50MB plus) and read 100k rows and do lots of interactions show memory differece 0.01-0.05MB (class definition loading takes memory).
When measuring memory after XML parsers, they all report memory "leag" counted in megabytes, worst case 80MB+

And out of memory has happened due to it already, so it seems to be eprmanent leak affecting new allocations.

I'll try to run the detailed measurements when running cron with 7.3 version and then can report the difference in more detail.
 [2020-10-03 07:19 UTC] efbiaiinzinz at hotmail dot com
I'm having bit of a trouble trying to debug it more due to the
"harmless" leak because it makes it hard to pinpoint exact place and
behaviour where leak is occurring.
I confirmed overnight that this leak issue also exists in 7.3.22.

I modified the sample script and changed:
$mem1 = memory_get_usage();
testFile($file);
$mem2 = memory_get_usage();
echo "$mem1\n$mem2\n";

to:
$mem1 = memory_get_usage();
testFile($file);
$mem2 = memory_get_usage();
$str500KB = str_repeat('1234567890', 50 * 1024);
$mem3 = memory_get_usage();
$str10MB = str_repeat('1234567890', 1024 * 1024);
$mem4 = memory_get_usage();
testFile($file);
$mem5 = memory_get_usage();
echo "$mem1\n$mem2\n$mem3\n$mem4\n$mem5\n";


I see output (running through apache)
391936
908032
1424128
11914008
11914008

Creating new strings does not reuse the memory but rather increases the
memory usage on top of existing number, but rerunning the same simplexml
elemnt parser + loop does not increase the total memory indeed.

Do you have any pointers on how to debug it more or if there is some
known edge case that can cause the object store harmless leak turn into
an actual leak?

Right now I have one codepath with xml input of 50MB and according to
memory_get_usage() running it twice in row results ~72MB getting lost
for each run and another codepath that uses 80MB json file as input and
first run shows memory difference 0.11MB (loads multiple classes) and
second run shows memory difference 0.00MB as expected.

I will try to narrow it down some more this weekend.
 [2020-10-03 09:13 UTC] efbiaiinzinz at hotmail dot com
Ok, found the trigger of the leak after reducing code step by step.

function generate($counter) {
	$file = tempnam(sys_get_temp_dir(), 'xml');
	$f = fopen($file, 'wb');
	fwrite($f, '<?xml version="1.0" encoding="UTF-8"?><items>' . "\n");
	for ($i = 1; $i <= $counter; $i++)
		fwrite($f, '<item id="' . $i . '"><id>' . $i . '</id><name>item ' . $i . '</name></item>' . "\n");
	fwrite($f, '</items>');
	fclose($f);
	return $file;
}
function testFile($file) {
	$xml = simplexml_load_file($file);
	foreach ($xml->item as $item)
		doLeak($item->name);
}
function doLeak($obj) {
	$obj .= ''; //this causes leak
	return (string)$obj;
}
$file = generate(50000);
$mem1 = memory_get_usage();
testFile($file);
$mem2 = memory_get_usage();
testFile($file);
$mem3 = memory_get_usage();
testFile($file);
$mem4 = memory_get_usage();
echo "$mem1\n$mem2\n$mem3\n$mem4\n";
unlink($file);

Every execution sees memory usage increasing permanently.
 [2020-10-05 13:37 UTC] nikic@php.net
When I try your code from the last comment, I get:

429864
429864
429864
429928

I.e. the memory usage stays about the same. I checked this on PHP 7.3, 7.4 and 8.0, and the behavior was the same for all.
 [2020-10-17 17:30 UTC] efbiaiinzinz at hotmail dot com
Sorry for late rely, seems email notification did not work for some reason.

The CLI version I tried has this value for Configure Command:
'./configure'  '--prefix=/usr' '--build=x86_64-pc-linux-gnu' '--host=x86_64-cros-linux-gnu' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--datadir=/usr/share' '--sysconfdir=/etc' '--localstatedir=/var/lib' '--docdir=/usr/share/doc/php-7.4.10' '--htmldir=/usr/share/doc/php-7.4.10/html' '--prefix=/usr/lib64/php7.4' '--mandir=/usr/lib64/php7.4/man' '--infodir=/usr/lib64/php7.4/info' '--libdir=/usr/lib64/php7.4/lib' '--with-libdir=lib64' '--localstatedir=/var' '--without-pear' '--enable-shared' '--disable-maintainer-zts' '--with-password-argon2=/usr' '--enable-bcmath=shared' '--with-bz2=shared,/usr' '--enable-calendar=shared' '--enable-ctype' '--with-curl=/usr' '--enable-dom' '--without-enchant' '--enable-exif' '--enable-fileinfo' '--enable-filter' '--enable-ftp=shared' '--with-gettext=shared' '--with-gmp=shared,/usr' '--without-mhash' '--with-iconv' '--enable-intl=shared' '--enable-ipv6' '--enable-json' '--with-kerberos=/usr' '--with-libxml' '--enable-mbstring' '--with-openssl=/usr' '--with-openssl-dir=/usr' '--enable-pcntl=shared' '--enable-pdo' '--enable-opcache=shared' '--with-pgsql=shared,/usr' '--enable-posix' '--with-pspell=shared,/usr' '--enable-simplexml' '--disable-shmop' '--without-snmp' '--enable-soap=shared' '--enable-sockets' '--with-sodium=shared,/usr' '--with-sqlite3=/usr' '--disable-sysvmsg' '--disable-sysvsem' '--disable-sysvshm' '--with-fpm-systemd' '--with-tidy=shared,/usr' '--enable-tokenizer' '--enable-xml' '--enable-xmlreader' '--enable-xmlwriter' '--with-xmlrpc=shared' '--with-xsl=shared,/usr' '--with-zip=shared' '--with-zlib=/usr' '--disable-debug' '--enable-phar=shared' '--without-cdb' '--without-db4' '--disable-flatfile' '--without-gdbm' '--disable-inifile' '--without-qdbm' '--with-freetype=/usr' '--disable-gd-jis-conv' '--with-jpeg=/usr' '--without-xpm' '--with-webp=/usr' '--enable-gd=shared' '--with-imap=shared,/usr' '--with-imap-ssl=/usr' '--with-ldap=shared,/usr' '--without-ldap-sasl' '--with-mysqli=mysqlnd' '--with-mysql-sock=/run/mysql/mysqld.socket' '--with-unixODBC=shared,/usr' '--without-iodbc' '--without-oci8' '--with-pdo-dblib=shared,/usr' '--with-pdo-mysql=mysqlnd' '--with-pdo-pgsql=shared' '--with-pdo-sqlite=/usr' '--without-pdo-firebird' '--with-pdo-odbc=shared,unixODBC,/usr' '--without-pdo-oci' '--with-readline=/usr' '--without-libedit' '--without-mm' '--with-pic' '--with-config-file-path=/etc/php/cli-php7.4' '--with-config-file-scan-dir=/etc/php/cli-php7.4/ext-active' '--disable-embed' '--enable-cli' '--disable-cgi' '--disable-fpm' '--without-apxs2' '--disable-phpdbg' 'build_alias=x86_64-pc-linux-gnu' 'host_alias=x86_64-cros-linux-gnu' 'CPPFLAGS='


PHP API => 20190902
PHP Extension => 20190902
Zend Extension => 320190902
Zend Extension Build => API320190902,NTS
PHP Extension Build => API20190902,NTS
Debug Build => no
Thread Safety => disabled
Zend Signal Handling => enabled
Zend Memory Manager => enabled
Zend Multibyte Support => provided by mbstring
IPv6 Support => enabled
DTrace Support => disabled

This program makes use of the Zend Scripting Language Engine:
Zend Engine v3.4.0, Copyright (c) Zend Technologies
    with Zend OPcache v7.4.10, Copyright (c), by Zend Technologies


Opcache config:
Zend OPcache

Opcode Caching => Disabled
Optimization => Disabled
SHM Cache => Enabled
File Cache => Disabled
Startup Failed => Opcode Caching is disabled for CLI

Directive => Local Value => Master Value
opcache.blacklist_filename => no value => no value
opcache.consistency_checks => 0 => 0
opcache.dups_fix => Off => Off
opcache.enable => On => On
opcache.enable_cli => Off => Off
opcache.enable_file_override => Off => Off
opcache.error_log => /var/vserver/log/php.d/digizone.ee.php.log => /var/vserver/log/php.d/digizone.ee.php.log
opcache.file_cache => no value => no value
opcache.file_cache_consistency_checks => On => On
opcache.file_cache_only => Off => Off
opcache.file_update_protection => 2 => 2
opcache.force_restart_timeout => 180 => 180
opcache.huge_code_pages => Off => Off
opcache.interned_strings_buffer => 8 => 8
opcache.lockfile_path => /data02/virt28010/tmp => /data02/virt28010/tmp
opcache.log_verbosity_level => 1 => 1
opcache.max_accelerated_files => 10000 => 10000
opcache.max_file_size => 0 => 0
opcache.max_wasted_percentage => 5 => 5
opcache.memory_consumption => 256MB => 256MB
opcache.opt_debug_level => 0 => 0
opcache.optimization_level => 0x7FFEBFFF => 0x7FFEBFFF
opcache.preferred_memory_model => posixp => posixp
opcache.preload => no value => no value
opcache.preload_user => no value => no value
opcache.protect_memory => Off => Off
opcache.restrict_api => no value => no value
opcache.revalidate_freq => 2 => 2
opcache.revalidate_path => Off => Off
opcache.save_comments => On => On
opcache.use_cwd => On => On
opcache.validate_permission => Off => Off
opcache.validate_root => Off => Off
opcache.validate_timestamps => On => On


After changing the empty string append into string casting the memory leak went away when running via CLI and via apache.
 [2020-10-17 17:32 UTC] efbiaiinzinz at hotmail dot com
Memory usage numbers that I'm seeing are:
392168
20232576
40081200
60978376
 [2020-10-18 04:22 UTC] php-bugs at lists dot php dot net
No feedback was provided. The bug is being suspended because
we assume that you are no longer experiencing the problem.
If this is not the case and you are able to provide the
information that was requested earlier, please do so and
change the status of the bug back to "Re-Opened". Thank you.
 [2020-10-21 09:48 UTC] efbiaiinzinz at hotmail dot com
This is still happening.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Dec 22 05:01:30 2024 UTC