|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #47502 xml_get_current_byte_index inside character data handler returns wrong offset
Submitted: 2009-02-25 15:56 UTC Modified: 2015-04-12 04:22 UTC
Avg. Score:3.5 ± 1.5
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: grodny at oneclick dot sk Assigned: rrichards (profile)
Status: No Feedback Package: XML related
PHP Version: 5.3CVS-2009-02-25 (snap) OS: Windows
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2009-02-25 15:56 UTC] grodny at oneclick dot sk
Byte index returned by xml_get_current_byte_index() call in character data handler, points to different locations of XML source, based on character data being parsed.

If parsed string passed as second argument to handler starts with ASCII non-white space character, byte index is offset to location before parsed string.

If parsed string starts with white space, or UTF-8 character, it points after parsed string.

To keep consistency with other handlers, it should return offset to location after parsed string, in all cases.

Reproduce code:
$xml = '<R><N>before</N><N>'
	.html_entity_decode('&sect;', ENT_COMPAT, 'UTF-8')
	.'after</N><N> after</N>before </R>';

function cdata ($p, $cdata) {
  global $xml;

  $off = xml_get_current_byte_index($p);

  echo 'CDATA: "',
    htmlentities($cdata, ENT_COMPAT, 'UTF-8'), '"', PHP_EOL,
    'AFTER-INDEX: "',
    htmlentities(substr($xml, $off), ENT_COMPAT, 'UTF-8'), '"',

$p = xml_parser_create('UTF-8');
xml_set_character_data_handler($p, 'cdata');
xml_parse($p, $xml, true);

Expected result:
CDATA: "before"
AFTER-INDEX: "</N><N>?after</N><N> after</N>before </R>"
CDATA: "?after"
AFTER-INDEX: "</N><N> after</N>before </R>"
CDATA: " after"
AFTER-INDEX: "</N>before </R>"
CDATA: "before "

Actual result:
CDATA: "before"
AFTER-INDEX: "before</N><N>?after</N><N> after</N>before </R>"
CDATA: "?after"
AFTER-INDEX: "</N><N> after</N>before </R>"
CDATA: " after"
AFTER-INDEX: "</N>before </R>"
CDATA: "before "
AFTER-INDEX: "before </R>"


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2009-02-26 09:13 UTC] grodny at oneclick dot sk
Possible solution could be introduction of second optional argument and thus enhacing current functionality and keeping backward compatibility.

xml_get_current_byte_index (resource $parser [, int $position=0])

  If 0, keep current behaviour for backward compatibility.
  If -1, return index of first byte of node
    (for start element or PI it is '<', for text node it is first byte of $data string passed to handler, etc.)
  If 1, return index after last byte of node.
    (for start element or PI byte index after '>', for text node after last byte of $data string passed to handler)
 [2010-06-20 21:31 UTC]
-Status: Open +Status: Assigned -Assigned To: +Assigned To: rrichards
 [2015-03-30 17:20 UTC]
-Status: Assigned +Status: Feedback
 [2015-03-30 17:20 UTC]
Thank you for this bug report. To properly diagnose the problem, we
need a short but complete example script to be able to reproduce
this bug ourselves. 

A proper reproducing script starts with <?php and ends with ?>,
is max. 10-20 lines long and does not require any external 
resources such as databases, etc. If the script requires a 
database to demonstrate the issue, please make sure it creates 
all necessary tables, stored procedures etc.

Please avoid embedding huge scripts into the report.

The following script works as expected:

    function parseCData($parser, $data)
        echo xml_get_current_byte_index($parser);
    $xml = '<foo>bar</foo>';
    $parser = xml_parser_create();
    xml_set_character_data_handler($parser, 'parseCData');
    xml_parse($parser, $xml);
 [2015-04-12 04:22 UTC] php-bugs at lists dot php dot net
No feedback was provided. The bug is being suspended because
we assume that you are no longer experiencing the problem.
If this is not the case and you are able to provide the
information that was requested earlier, please do so and
change the status of the bug back to "Re-Opened". Thank you.
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Jul 16 01:01:28 2024 UTC