|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2009-02-25 15:56 UTC] grodny at oneclick dot sk
Description:
------------
Byte index returned by xml_get_current_byte_index() call in character data handler, points to different locations of XML source, based on character data being parsed.
If parsed string passed as second argument to handler starts with ASCII non-white space character, byte index is offset to location before parsed string.
If parsed string starts with white space, or UTF-8 character, it points after parsed string.
To keep consistency with other handlers, it should return offset to location after parsed string, in all cases.
Reproduce code:
---------------
$xml = '<R><N>before</N><N>'
.html_entity_decode('§', ENT_COMPAT, 'UTF-8')
.'after</N><N> after</N>before </R>';
function cdata ($p, $cdata) {
global $xml;
$off = xml_get_current_byte_index($p);
echo 'CDATA: "',
htmlentities($cdata, ENT_COMPAT, 'UTF-8'), '"', PHP_EOL,
'AFTER-INDEX: "',
htmlentities(substr($xml, $off), ENT_COMPAT, 'UTF-8'), '"',
PHP_EOL;
}
$p = xml_parser_create('UTF-8');
xml_set_character_data_handler($p, 'cdata');
xml_parse($p, $xml, true);
xml_parser_free($p);
Expected result:
----------------
CDATA: "before"
AFTER-INDEX: "</N><N>?after</N><N> after</N>before </R>"
CDATA: "?after"
AFTER-INDEX: "</N><N> after</N>before </R>"
CDATA: " after"
AFTER-INDEX: "</N>before </R>"
CDATA: "before "
AFTER-INDEX: "</R>"
Actual result:
--------------
CDATA: "before"
AFTER-INDEX: "before</N><N>?after</N><N> after</N>before </R>"
CDATA: "?after"
AFTER-INDEX: "</N><N> after</N>before </R>"
CDATA: " after"
AFTER-INDEX: "</N>before </R>"
CDATA: "before "
AFTER-INDEX: "before </R>"
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Thu Nov 06 21:00:02 2025 UTC |
Possible solution could be introduction of second optional argument and thus enhacing current functionality and keeping backward compatibility. xml_get_current_byte_index (resource $parser [, int $position=0]) position: If 0, keep current behaviour for backward compatibility. If -1, return index of first byte of node (for start element or PI it is '<', for text node it is first byte of $data string passed to handler, etc.) If 1, return index after last byte of node. (for start element or PI byte index after '>', for text node after last byte of $data string passed to handler)Thank you for this bug report. To properly diagnose the problem, we need a short but complete example script to be able to reproduce this bug ourselves. A proper reproducing script starts with <?php and ends with ?>, is max. 10-20 lines long and does not require any external resources such as databases, etc. If the script requires a database to demonstrate the issue, please make sure it creates all necessary tables, stored procedures etc. Please avoid embedding huge scripts into the report. The following script works as expected: <?php function parseCData($parser, $data) { echo xml_get_current_byte_index($parser); } $xml = '<foo>bar</foo>'; $parser = xml_parser_create(); xml_set_character_data_handler($parser, 'parseCData'); xml_parse($parser, $xml);