php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #81481 xml_get_current_byte_index limited to 32-bit numbers on 64-bit builds
Submitted: 2021-09-26 15:41 UTC Modified: 2021-09-27 10:23 UTC
Votes:2
Avg. Score:4.0 ± 1.0
Reproduced:0 of 1 (0.0%)
From: dev at b65sol dot com Assigned:
Status: Closed Package: XML Reader
PHP Version: 8.0.11 OS: 64-bit Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: dev at b65sol dot com
New email:
PHP Version: OS:

 

 [2021-09-26 15:41 UTC] dev at b65sol dot com
Description:
------------
xml_get_current_byte_index is limited to 32-bit integers even on 64-bit builds, resulting in negative or incorrectly small results after parsing 2GiB or more XML text, which is well below PHP_INT_MAX.

To get the "Expected" results listed, I changed the definition on XML_GetCurrentByteIndex to be long instead of int, but I'm unsure if that's a truly portable fix.

Test script:
---------------
<?php
$parser = xml_parser_create('UTF-8');
xml_set_element_handler( $parser, 'startelement', null );

$emptylong = str_repeat(' ', 1024*1024);
xml_parse($parser, '<root><i></i><b/><ext>Hello</ext>', false);
for($i = 0; $i < 2896; $i++) {
    xml_parse($parser, $emptylong, false);
}
xml_parse($parser, '<ext></ext><ext></ext></root>', false);

function startelement($parser, $name, $attribute) {
    if ( $name == 'EXT' ) { echo "Byte Index:".xml_get_current_byte_index($parser)."\n"; }
}

Expected result:
----------------
Byte Index:21
Byte Index:3036676133
Byte Index:3036676144

Actual result:
--------------
Byte Index:21
Byte Index:-1258291163
Byte Index:-1258291152

Patches

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2021-09-27 10:23 UTC] cmb@php.net
-Status: Open +Status: Verified
 [2021-09-27 10:23 UTC] cmb@php.net
For compatibility with expat 1, XML_GetCurrentByteIndex[1] should
indeed return long[2].  Care should be taken that no wrap around
occurs, since parser->parser->input->consumed is an unsigned long.
Anyhow, returning long wouldn't make a difference on Windows
(LLP64).  The situation is different for expat 2, which returns
XML_Index[3] which depends on a configuration value[4].  I'm not
sure, however, whether expat 2 is supposed to be supported at all.

We should also have a look at XML_GetCurrentByteCount[5], which is
currently the same as XML_GetCurrentByteIndex, but is supposed to
return the number of bytes in the current event[6].

[1] <https://github.com/php/php-src/blob/php-7.4.24/ext/xml/compat.c#L705-L710>
[2] <https://github.com/libexpat/libexpat/blob/V1_0/expat/xmlparse/xmlparse.h#L364>
[3] <https://github.com/libexpat/libexpat/blob/R_2_4_1/expat/lib/expat.h#L932>
[4] <https://github.com/libexpat/libexpat/blob/R_2_4_1/expat/lib/expat_external.h#L153-L159>
[5] <https://github.com/php/php-src/blob/php-7.4.24/ext/xml/compat.c#L712-L719>
[6] <https://github.com/libexpat/libexpat/blob/R_2_4_1/expat/doc/reference.html#L2043-L2054>
 [2024-07-06 16:34 UTC] git@php.net
Automatic comment on behalf of nielsdos (author) and web-flow (committer)
Revision: https://github.com/php/php-src/commit/b41e90c6f9ea0a6ee225f9d6f111ddf62fd782e2
Log: Fix bug #81481 (xml_get_current_byte_index limited to 32-bit numbers on 64-bit builds) (#14845)
 [2024-07-06 16:34 UTC] git@php.net
-Status: Verified +Status: Closed
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Mon Sep 09 05:01:27 2024 UTC