|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #47066 Entity reference parsing broken since libxml 2.5.x
Submitted: 2009-01-11 09:37 UTC Modified: 2009-01-11 11:52 UTC
From: tstarling at wikimedia dot org Assigned:
Status: Not a bug Package: XML related
PHP Version: 5.2.8 OS: Linux
Private report: No CVE-ID: None
 [2009-01-11 09:37 UTC] tstarling at wikimedia dot org
The xml extension (xml_parser_create() etc) has totally broken entity reference parsing when compiled with a modern libxml2 library. This appears to be due to some clueless code inserted in compat.c 1.37 (November 2004)

	parser->parser->wellFormed = 0;

parser->wellFormed is set to 1 by libxml's xmlInitParserCtxt(), and then to 0 by all the error cases that make the document not well-formed. Setting wellFormed=0, before parsing even begins, means that all input is unconditionally considered to be not well formed. This probably causes all sorts of bugs, but the present one is an interaction with libxml2 r1177:

At line 5174 on the right, entities such as "<" encountered while wellFormed==0 are ignored. 

Simply removing the quoted line from compat.c fixes the bug, without breaking any unit tests.

Reported to Wikimedia at

Reproduce code:
Compile libxml2 from source, version 2.5 or later, and link it to PHP with --with-libxml-dir.


function ping($parser, $s) {
  echo "$s\n";

$parser = xml_parser_create();

Expected result:

Actual result:


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2009-01-11 11:52 UTC]
Please do not submit the same bug more than once. An existing
bug report already describes this very problem. Even if you feel
that your issue is somewhat different, the resolution is likely
to be the same. 

Thank you for your interest in PHP.

Dupe of bug #45996
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Jul 23 00:01:29 2024 UTC