php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #74909 XMLReader reports "Input is not proper UTF-8" on long Chinese strings
Submitted: 2017-07-12 09:40 UTC Modified: 2018-04-03 06:36 UTC
From: lilydjwg at gmail dot com Assigned: cmb (profile)
Status: Closed Package: XML Reader
PHP Version: 7.1.7 OS: Arch Linux x86_64
Private report: No CVE-ID: None
 [2017-07-12 09:40 UTC] lilydjwg at gmail dot com
Description:
------------
The error position is at the last character (\xef\xbc\x8c in UTF-8). If I make the string longer, the position doesn't change. If I make it one character shorter than the example, the error disappears.

I've read bug #72181 but can't reproduce that one.

My libxml2 version is 2.9.4+96+gfb56f80e-1

The XML doc for testing:

<?xml version="1.0" encoding="UTF-8"?>
<properties>
制馬吃富不國助病以孩上容留消生強德,準大指總產如戰上是個!沒目人起如物對了話外多年至性己外頭出假。在館演量難調代什如常千苦友議,學字而的響過紙看正家!玩車一,是華是體過記用說感公媽間城石也都於教行,獲驚稱重眼發工,層上由的間近要過將去資球我老來許推、種得海在中個產程的治教他院一;持一建情。你室隊花……放成動,
</properties>

Test script:
---------------
<?php
$reader = new XMLReader();
$reader->open("a.xml");
while($reader->read()) {
}

Actual result:
--------------
PHP Warning:  XMLReader::read(): /home/lilydjwg/tmp/a.xml:3: parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0xEF 0xBC EOF in /home/lilydjwg/.cache/tmp/a.php on line 4
PHP Warning:  XMLReader::read(): 在中個產程的治教他院一;持一建情。你室隊花……放成動 in /home/lilydjwg/tmp/a.php on line 4
PHP Warning:  XMLReader::read():   

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-04-02 16:06 UTC] cmb@php.net
-Status: Open +Status: Feedback -Assigned To: +Assigned To: cmb
 [2018-04-02 16:06 UTC] cmb@php.net
I cannot reproduce this issue with libxml2 2.9.4+dfsg1-2.2+deb9u1.
Could you please provide the XML document for download?
 [2018-04-03 06:31 UTC] lilydjwg at gmail dot com
I guess this has been fixed already.

This error has disappeared since php 7.1.9-1 and libxml2 2.9.5rc2+0+g69936b12-1. The last version I have that can reproduce this is php 7.1.8-1 and libxml2 2.9.4+99+g27f310d4-1.

(I lost my password to edit this bug.)
 [2018-04-03 06:36 UTC] requinix@php.net
-Status: Feedback +Status: Closed
 
PHP Copyright © 2001-2021 The PHP Group
All rights reserved.
Last updated: Sun Sep 19 06:03:36 2021 UTC