php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #36596 xml_get_current_byte_index reports in relation to utf8
Submitted: 2006-03-03 00:12 UTC Modified: 2006-03-07 02:17 UTC
From: preben at infofab dot no Assigned:
Status: Closed Package: Documentation problem
PHP Version: Irrelevant OS: Linux FC4
Private report: No CVE-ID: None
 [2006-03-03 00:12 UTC] preben at infofab dot no
Description:
------------
The function xml_get_current_byte_index() returns byte index according to utf8 encoded text - disregarding if input is in ISO-8859-1. If you would like to cach a chunk of data from a xml file you would need to convert input to utf8 to get the correct calculations.

Reproduce code:
---------------
$data = "<?xml version='1.0' encoding='iso8859-1'?>\n<data>\n<dummy>????</dummy>\n<row>???data</row>\n</data>\n";
function startElement($parser, $name, $attrs)
{
   global $data;
   if($name != "ROW") return;
   $byte = xml_get_current_byte_index($parser) + 41;
   $ch = substr($data, $byte + 41, 7);
   # The following calculates correct
   # $ch = utf8_decode(substr(utf8_encode($data), $byte, 7));
   echo "$name ($byte): $ch\n";
   exit();
}
function endElement($parser, $name)
{
}
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_parse($xml_parser, $data, 1);

Expected result:
----------------
ROW (79): ???d

Actual result:
--------------
ROW (79):

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-03-03 12:37 UTC] preben at infofab dot no
Sorry, an error in statement - correct version

$ch = substr($data, $byte, 7);

The "bug" still stands though...
 [2006-03-03 12:37 UTC] preben at infofab dot no
Sorry, an error in statement - correct version

$ch = substr($data, $byte, 7);

The "bug" still stands though...
 [2006-03-07 02:17 UTC] vrana@php.net
This bug has been fixed in the documentation's XML sources. Since the
online and downloadable versions of the documentation need some time
to get updated, we would like to ask you to be a bit patient.

Thank you for the report, and for helping us make our documentation better.

"This function returns byte index according to UTF-8 encoded text disregarding if input is in another encoding."
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Sat Jul 12 08:01:30 2025 UTC