php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #40762 xml_parser failed to parse mixed coding file
Submitted: 2007-03-08 21:56 UTC Modified: 2007-03-09 08:05 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:0 of 0 (0.0%)
From: forward at hongyu dot org Assigned:
Status: Not a bug Package: *XML functions
PHP Version: 5.2.1 OS: Linux and Windows
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: forward at hongyu dot org
New email:
PHP Version: OS:

 

 [2007-03-08 21:56 UTC] forward at hongyu dot org
Description:
------------
My RSS parser failed after I upgrade the PHP version on my server from 4.x to 5.2. When I debugged the code, I found the error was caused by the xml_parse() function's failure to parse the UTF-8 encoded RSS message, which is originally converted from a GB18030 string. 

The error message looks like: 
"Warning: xml_parse() [function.xml-parse]: input conversion failed due to input error, bytes 0x9B 0xE6 ..."

The orginal GB encoded string consists of Chinese characters, but I converted it to UTF-8 coding using function iconv(). I can view the converted string correctly on web browsers, which means that there is no converting error. So the failure only comes from xml_parse() function, I believe.

For your testing purpose, an example of the original GB18030 string can be downloaded at http://www.la-chinese.com/forum/rss.php?f=19

Thanks!



Reproduce code:
---------------
// variable $gb contains the GB encoded string, e.g., from
// web address http://www.la-chinese.com/forum/rss.php?f=19

// variable $utf contains the UTF-8 string converted from 
// the original GB encoded string

        $urf = iconv('GB18030','UTF-8', $gb);

// function feed_start_end and feed_end_element etc. are from
// the package Magpierss http://magpierss.sourceforge.net/

        xml_set_object( $parser, $this );
        xml_set_element_handler($parser, 
                'feed_start_element', 'feed_end_element' );
                        
        xml_set_character_data_handler( $parser, 'feed_cdata' ); 
    
        $status = xml_parse( $parser, $utf );


Expected result:
----------------
No error message.

Actual result:
--------------
Warning: xml_parse() [function.xml-parse]: input conversion failed due to input error, bytes 0x9B 0xE6 ...


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-03-09 06:11 UTC] chregu@php.net
You have to change this line in the XML, too

<?xml version="1.0" encoding="gb2312" ?>


 [2007-03-09 08:05 UTC] forward at hongyu dot org
Exactly what you said. Thanks a lot!
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 13:01:31 2024 UTC