|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #81481 xml_get_current_byte_index limited to 32-bit numbers on 64-bit builds
Submitted: 2021-09-26 15:41 UTC Modified: 2021-09-27 10:23 UTC
Avg. Score:3.0 ± 0.0
Reproduced:0 of 0 (0.0%)
From: dev at b65sol dot com Assigned:
Status: Verified Package: XML Reader
PHP Version: 8.0.11 OS: 64-bit Linux
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2021-09-26 15:41 UTC] dev at b65sol dot com
xml_get_current_byte_index is limited to 32-bit integers even on 64-bit builds, resulting in negative or incorrectly small results after parsing 2GiB or more XML text, which is well below PHP_INT_MAX.

To get the "Expected" results listed, I changed the definition on XML_GetCurrentByteIndex to be long instead of int, but I'm unsure if that's a truly portable fix.

Test script:
$parser = xml_parser_create('UTF-8');
xml_set_element_handler( $parser, 'startelement', null );

$emptylong = str_repeat(' ', 1024*1024);
xml_parse($parser, '<root><i></i><b/><ext>Hello</ext>', false);
for($i = 0; $i < 2896; $i++) {
    xml_parse($parser, $emptylong, false);
xml_parse($parser, '<ext></ext><ext></ext></root>', false);

function startelement($parser, $name, $attribute) {
    if ( $name == 'EXT' ) { echo "Byte Index:".xml_get_current_byte_index($parser)."\n"; }

Expected result:
Byte Index:21
Byte Index:3036676133
Byte Index:3036676144

Actual result:
Byte Index:21
Byte Index:-1258291163
Byte Index:-1258291152


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2021-09-27 10:23 UTC]
-Status: Open +Status: Verified
 [2021-09-27 10:23 UTC]
For compatibility with expat 1, XML_GetCurrentByteIndex[1] should
indeed return long[2].  Care should be taken that no wrap around
occurs, since parser->parser->input->consumed is an unsigned long.
Anyhow, returning long wouldn't make a difference on Windows
(LLP64).  The situation is different for expat 2, which returns
XML_Index[3] which depends on a configuration value[4].  I'm not
sure, however, whether expat 2 is supposed to be supported at all.

We should also have a look at XML_GetCurrentByteCount[5], which is
currently the same as XML_GetCurrentByteIndex, but is supposed to
return the number of bytes in the current event[6].

[1] <>
[2] <>
[3] <>
[4] <>
[5] <>
[6] <>
PHP Copyright © 2001-2023 The PHP Group
All rights reserved.
Last updated: Mon Dec 11 10:01:29 2023 UTC