php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #47502 xml_get_current_byte_index inside character data handler returns wrong offset
Submitted: 2009-02-25 15:56 UTC Modified: 2015-04-12 04:22 UTC
Votes:2
Avg. Score:3.5 ± 1.5
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: grodny at oneclick dot sk Assigned: rrichards (profile)
Status: No Feedback Package: XML related
PHP Version: 5.3CVS-2009-02-25 (snap) OS: Windows
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: grodny at oneclick dot sk
New email:
PHP Version: OS:

 

 [2009-02-25 15:56 UTC] grodny at oneclick dot sk
Description:
------------
Byte index returned by xml_get_current_byte_index() call in character data handler, points to different locations of XML source, based on character data being parsed.

If parsed string passed as second argument to handler starts with ASCII non-white space character, byte index is offset to location before parsed string.

If parsed string starts with white space, or UTF-8 character, it points after parsed string.

To keep consistency with other handlers, it should return offset to location after parsed string, in all cases.


Reproduce code:
---------------
$xml = '<R><N>before</N><N>'
	.html_entity_decode('&sect;', ENT_COMPAT, 'UTF-8')
	.'after</N><N> after</N>before </R>';

function cdata ($p, $cdata) {
  global $xml;

  $off = xml_get_current_byte_index($p);

  echo 'CDATA: "',
    htmlentities($cdata, ENT_COMPAT, 'UTF-8'), '"', PHP_EOL,
    'AFTER-INDEX: "',
    htmlentities(substr($xml, $off), ENT_COMPAT, 'UTF-8'), '"',
    PHP_EOL;
}

$p = xml_parser_create('UTF-8');
xml_set_character_data_handler($p, 'cdata');
xml_parse($p, $xml, true);
xml_parser_free($p);


Expected result:
----------------
CDATA: "before"
AFTER-INDEX: "</N><N>?after</N><N> after</N>before </R>"
CDATA: "?after"
AFTER-INDEX: "</N><N> after</N>before </R>"
CDATA: " after"
AFTER-INDEX: "</N>before </R>"
CDATA: "before "
AFTER-INDEX: "</R>"


Actual result:
--------------
CDATA: "before"
AFTER-INDEX: "before</N><N>?after</N><N> after</N>before </R>"
CDATA: "?after"
AFTER-INDEX: "</N><N> after</N>before </R>"
CDATA: " after"
AFTER-INDEX: "</N>before </R>"
CDATA: "before "
AFTER-INDEX: "before </R>"

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-02-26 09:13 UTC] grodny at oneclick dot sk
Possible solution could be introduction of second optional argument and thus enhacing current functionality and keeping backward compatibility.

xml_get_current_byte_index (resource $parser [, int $position=0])

position:
  If 0, keep current behaviour for backward compatibility.
  If -1, return index of first byte of node
    (for start element or PI it is '<', for text node it is first byte of $data string passed to handler, etc.)
  If 1, return index after last byte of node.
    (for start element or PI byte index after '>', for text node after last byte of $data string passed to handler)
 [2010-06-20 21:31 UTC] felipe@php.net
-Status: Open +Status: Assigned -Assigned To: +Assigned To: rrichards
 [2015-03-30 17:20 UTC] cmb@php.net
-Status: Assigned +Status: Feedback
 [2015-03-30 17:20 UTC] cmb@php.net
Thank you for this bug report. To properly diagnose the problem, we
need a short but complete example script to be able to reproduce
this bug ourselves. 

A proper reproducing script starts with <?php and ends with ?>,
is max. 10-20 lines long and does not require any external 
resources such as databases, etc. If the script requires a 
database to demonstrate the issue, please make sure it creates 
all necessary tables, stored procedures etc.

Please avoid embedding huge scripts into the report.

The following script works as expected:

    <?php
    
    function parseCData($parser, $data)
    {
        echo xml_get_current_byte_index($parser);
    }
    
    $xml = '<foo>bar</foo>';
    $parser = xml_parser_create();
    xml_set_character_data_handler($parser, 'parseCData');
    xml_parse($parser, $xml);
 [2015-04-12 04:22 UTC] php-bugs at lists dot php dot net
No feedback was provided. The bug is being suspended because
we assume that you are no longer experiencing the problem.
If this is not the case and you are able to provide the
information that was requested earlier, please do so and
change the status of the bug back to "Re-Opened". Thank you.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Dec 26 19:01:30 2024 UTC