php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #79642 xml_parse() fails with XML_ERROR_NO_MEMORY with a mere 17M file
Submitted: 2020-05-27 13:05 UTC Modified: 2020-05-27 16:51 UTC
From: php4fan at gmail dot com Assigned:
Status: Open Package: *XML functions
PHP Version: 7.2.31 OS: debian
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2020-05-27 13:05 UTC] php4fan at gmail dot com
Description:
------------
I'm trying to parse an XML file that is about 17 MegaBytes in size with xml_parse(), and I get the XML_ERROR_NO_MEMORY error.

Even increasing the memory limit to 8 GB.

I can understand a little bit of overhead, but there's no way using several GB of memory, to decode an XML that is just 17MB in its string form, can be justified.


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-05-27 14:07 UTC] sjon@php.net
Can you share the xml file and the exact code you are using? I've tested the (chunked) example code on https://www.php.net/xml_parse using a 112M XML file and have no problems running it with a 128M memory_limit

[1] http://www.ins.cwi.nl/projects/xmark/Assets/standard.gz
 [2020-05-27 14:10 UTC] sjon@php.net
-Status: Open +Status: Feedback
 [2020-05-27 14:50 UTC] cmb@php.net
The problem might be that XML_PARSE_HUGE is not set;
unfortunately, it is currently not possible to pass that option
via xml_parser_set_option() or otherwise.

Anyway, it may be better to parse in chunks.
 [2020-05-27 15:08 UTC] php4fan at gmail dot com
-Status: Feedback +Status: Open
 [2020-05-27 15:08 UTC] php4fan at gmail dot com
> The problem might be that XML_PARSE_HUGE is not set;

Yep, found out that's the problem.

Been known for like three years at least, right?

> unfortunately, it is currently not possible to pass that option
> via xml_parser_set_option() or otherwise.

Quite a disgrace.


Yep, I worked around it by parsing in chunks. (why can't parse_xml() do it internally, by the way?)
 [2020-05-27 16:51 UTC] cmb@php.net
> why can't parse_xml() do it internally, by the way?

Well, it could, but you'd still have to provide the full XML as
string, what might be wasteful.  In my opinion, xml_parse_stream()
would be nice to have, but on the other hand, that can easily be
written in userland as well.
 
PHP Copyright © 2001-2020 The PHP Group
All rights reserved.
Last updated: Wed Oct 21 08:01:23 2020 UTC