php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #81481 xml_get_current_byte_index limited to 32-bit numbers on 64-bit builds
Submitted: 2021-09-26 15:41 UTC Modified: 2021-09-27 10:23 UTC
Votes:1
Avg. Score:3.0 ± 0.0
Reproduced:0 of 0 (0.0%)
From: dev at b65sol dot com Assigned:
Status: Verified Package: XML Reader
PHP Version: 8.0.11 OS: 64-bit Linux
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: dev at b65sol dot com
New email:
PHP Version: OS:

 

 [2021-09-26 15:41 UTC] dev at b65sol dot com
Description:
------------
xml_get_current_byte_index is limited to 32-bit integers even on 64-bit builds, resulting in negative or incorrectly small results after parsing 2GiB or more XML text, which is well below PHP_INT_MAX.

To get the "Expected" results listed, I changed the definition on XML_GetCurrentByteIndex to be long instead of int, but I'm unsure if that's a truly portable fix.

Test script:
---------------
<?php
$parser = xml_parser_create('UTF-8');
xml_set_element_handler( $parser, 'startelement', null );

$emptylong = str_repeat(' ', 1024*1024);
xml_parse($parser, '<root><i></i><b/><ext>Hello</ext>', false);
for($i = 0; $i < 2896; $i++) {
    xml_parse($parser, $emptylong, false);
}
xml_parse($parser, '<ext></ext><ext></ext></root>', false);

function startelement($parser, $name, $attribute) {
    if ( $name == 'EXT' ) { echo "Byte Index:".xml_get_current_byte_index($parser)."\n"; }
}

Expected result:
----------------
Byte Index:21
Byte Index:3036676133
Byte Index:3036676144

Actual result:
--------------
Byte Index:21
Byte Index:-1258291163
Byte Index:-1258291152

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2021-09-27 10:23 UTC] cmb@php.net
-Status: Open +Status: Verified
 [2021-09-27 10:23 UTC] cmb@php.net
For compatibility with expat 1, XML_GetCurrentByteIndex[1] should
indeed return long[2].  Care should be taken that no wrap around
occurs, since parser->parser->input->consumed is an unsigned long.
Anyhow, returning long wouldn't make a difference on Windows
(LLP64).  The situation is different for expat 2, which returns
XML_Index[3] which depends on a configuration value[4].  I'm not
sure, however, whether expat 2 is supposed to be supported at all.

We should also have a look at XML_GetCurrentByteCount[5], which is
currently the same as XML_GetCurrentByteIndex, but is supposed to
return the number of bytes in the current event[6].

[1] <https://github.com/php/php-src/blob/php-7.4.24/ext/xml/compat.c#L705-L710>
[2] <https://github.com/libexpat/libexpat/blob/V1_0/expat/xmlparse/xmlparse.h#L364>
[3] <https://github.com/libexpat/libexpat/blob/R_2_4_1/expat/lib/expat.h#L932>
[4] <https://github.com/libexpat/libexpat/blob/R_2_4_1/expat/lib/expat_external.h#L153-L159>
[5] <https://github.com/php/php-src/blob/php-7.4.24/ext/xml/compat.c#L712-L719>
[6] <https://github.com/libexpat/libexpat/blob/R_2_4_1/expat/doc/reference.html#L2043-L2054>
 
PHP Copyright © 2001-2021 The PHP Group
All rights reserved.
Last updated: Mon Oct 25 17:03:33 2021 UTC