php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #71805 XML files can generate UTF-8 error even if they are UTF-8
Submitted: 2016-03-11 21:22 UTC Modified: 2016-05-15 04:38 UTC
Votes:2
Avg. Score:4.0 ± 1.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:0 (0.0%)
From: goozak at gmail dot com Assigned: ab (profile)
Status: Closed Package: XML Reader
PHP Version: 5.5Git-2016-03-11 (snap) OS: Win7 SP1
Private report: No CVE-ID: None
 [2016-03-11 21:22 UTC] goozak at gmail dot com
Description:
------------
XML Reader generate error
"Warning: XMLReader::readOuterXml(): .....:5: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xC3 0xA9 0x78 0x20"
even though the file is properly encoded in UTF-8.

A small change to the XML content can be made (remove a tag or even just some characters) and the file is then read properly.

Test script:
---------------
https://gist.github.com/goozak/aa3f5bd5a146a51ddd75

Two files are used in the demo.  Result will be the same if they are downloaded locally.


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-03-14 15:01 UTC] ab@php.net
-Status: Open +Status: Verified -Assigned To: +Assigned To: ab
 [2016-03-14 15:01 UTC] ab@php.net
Thanks for the report. Looks like we see a regression in libxml 2.9.3, see https://bugzilla.gnome.org/show_bug.cgi?id=760183 . No fix to this is currently available in the libxml2 repo. I'll keep an eye and will apply this patch to our dependency base as soon as it's available.

Thanks.
 [2016-05-03 16:10 UTC] ab@php.net
-Status: Verified +Status: Feedback
 [2016-05-03 16:10 UTC] ab@php.net
Dependency packages are upgraded. Please check the upcoming RCs.

Thanks.
 [2016-05-03 18:23 UTC] goozak at gmail dot com
I'm on Windows, so I'll check the 5.6 RC when available.  The 5.5 one seems stuck on 5.5.27RC1 (2015-Jun-24) http://windows.php.net/qa/
 [2016-05-13 14:10 UTC] goozak at gmail dot com
Bug fixed in PHP 5.6 (5.6.22RC1) VC11 x86 Thread Safe (2016-May-12 21:22:39).
No RC for PHP 5.5.
 [2016-05-15 04:22 UTC] php-bugs at lists dot php dot net
No feedback was provided. The bug is being suspended because
we assume that you are no longer experiencing the problem.
If this is not the case and you are able to provide the
information that was requested earlier, please do so and
change the status of the bug back to "Re-Opened". Thank you.
 [2016-05-15 04:38 UTC] requinix@php.net
-Status: No Feedback +Status: Closed
 [2016-05-15 04:38 UTC] requinix@php.net
Nothing for 5.5 because it's only receiving security fixes.
 [2016-05-15 09:58 UTC] goozak at gmail dot com
I did provide feedback on [2016-05-13 14:10 UTC] - the bug is fixed in the latest 5.6 RC.

And I do hope that this regression introduced in 5.5.32 (one of the recent security fix) will still get fixed...
 [2016-05-26 15:17 UTC] goozak at gmail dot com
For what it's worth (since bug was closed for nebulous reasons and I can't reopen it), the bug is also fixed in the new 5.5.36 Windows build.
 
PHP Copyright © 2001-2020 The PHP Group
All rights reserved.
Last updated: Mon Aug 03 21:01:24 2020 UTC