php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #45996 libxml2 2.7.1 causes breakage with character data in xml_parse()
Submitted: 2008-09-04 17:29 UTC Modified: 2011-01-04 22:08 UTC
Votes:166
Avg. Score:4.8 ± 0.5
Reproduced:148 of 151 (98.0%)
Same Version:122 (82.4%)
Same OS:16 (10.8%)
From: phpbugs at colin dot guthr dot ie Assigned: rrichards (profile)
Status: Closed Package: XML related
PHP Version: 5.2.6 OS: Mandriva Linux
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: phpbugs at colin dot guthr dot ie
New email:
PHP Version: OS:

 

 [2008-09-04 17:29 UTC] phpbugs at colin dot guthr dot ie
Description:
------------
With libxml2 2.7.1, When using the expat type xml parsing routines in PHP, the characater data seems to silently drop any encoded text e.g. > < and friends.

Please see Mandriva bug for details:
https://qa.mandriva.com/show_bug.cgi?id=43486

And also please note the thread on the libxml mailing list:
http://thread.gmane.org/gmane.comp.gnome.lib.xml.general/14610

And most notably the reply to the above thread:
<quote>
Can you report this as a PHP bug? It looks like some really old hack 
code in the PHP extension in order to mimic some specific expat 
functionality. The behavior change you see though resulting from a code changes in libxml2 is really due to the hackish code in the extension doing things it wasnt meant to be doing.
</quote>

Reproduce code:
---------------
Please see this code:
https://qa.mandriva.com/attachment.cgi?id=10757

Expected result:
----------------
<
foo
>
wibble
<
/foo
>


Actual result:
--------------
foo
wibble
/foo


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-09-06 15:43 UTC] jani@php.net
Assigned to the maintainer (Rob, don't forget to change status too when you assign something to yourself :)
 [2008-09-09 23:06 UTC] phpbugs at colin dot guthr dot ie
Comments by Daniel Veillard on the libxml ML:

  The only thing I can think of is that libxml2 doesn't anymore ask
though a SAX callback when looking for entities references if they
are in the predefined set. This comes in essence by an old decision
from the XML working group stating that user definition for those 5
entities could not override the default predefined ones. So I guess
that change is logical. Now what is done on top of SAX to result
in that bug, I don't really know  :-\
 [2008-10-07 11:19 UTC] ptn at post dot cz
this bug seems to be fixed in libxm2-2.7.2

http://svn.gnome.org/viewvc/libxml2?view=revision&revision=3798
 [2008-10-08 09:18 UTC] uraes at hot dot ee
just tried libxml2-2.7.2 and 5.2.6-pl7-gentoo and it is still broken:

Example PHP code:
<?
$data="<?xml version = '1.0' encoding = 'UTF-8'?>
<rss version=\"2.0\" >
  <channel>
    <item>
      <description>&lt;a href=&quot;http://www.google.com&quot;>Google&lt;/a></description>
    </item>
  </channel>
</rss>
";

$parser = xml_parser_create('UTF-8');
xml_parser_set_option($parser, XML_OPTION_SKIP_WHITE, 1);
xml_parse_into_struct($parser, $data, $vals, $index);
xml_parser_free($parser);

echo "<pre>";

echo "<b>Original XML:</b><br>".htmlentities($data);

echo "<br><br><b>Parsed struct:</b><br>";
print_r($vals);
?>

.. parsed result is "a href=http://www.google.com>Google/a>"
 [2008-10-08 09:50 UTC] phpbugs at colin dot guthr dot ie
Yes, I suspect that the comments left by ptn at post dot cz are incorrect when they say it is fixed in libxml. rrichards has given a very complete explanation of the problem and it is more fundamental than a simple bug.

Compiling PHP with libexpat is the correct workaround for now.
 [2008-10-15 00:04 UTC] mike at kogan dot org
I also have run into this - we had some legacy php code on the xml_parser that was fine on some centos 4 servers with php4 and 5 running apache 1.3. We've been debugging this failure for a day now on our new centos 5 server running php5 and libxml2 2.7.2, and we confirm the same problem. The characterHandler is not called for the known entities so scripts depending on this (rss feed converters etc) emit flawed html. I agree there's much better ways to parse XML but this is legacy stuff thats somewhat pervasive and we didn;t choose what these folks used for their apps.

I'd love to rebuild their server with an older libxml2 but am not sure how to go backwards without causing some other problem. Customer has cpanel/whm and all that hooey and I'd rather not create a mess on their new server.

Hope ya'll fix this soon as it is an issue on the cpanel folks that have 2.7.2 in their stable branch for centos 5 that is being spread by their updater.

If someone can give me a pointer that a straightup build and install of the old release code wont make things worse I'll take a crack at moving their server back.
 [2008-10-15 09:02 UTC] phpbugs at colin dot guthr dot ie
Mike, it's fairly easy to recompile PHP with the libexpat library for the legacy XML parsing functions while keeping libxml2 for the more modern ones.

We did that in the Mandriva package for our 2009.0 release after I reported the bug.

See the SPEC file here:
http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/updates/2009.0/php/current/SPECS/php.spec?revision=291141&view=markup

The particular change that worked around it is here:
http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/updates/2009.0/php/current/SPECS/php.spec?r1=278891&r2=281822

I'm sure you can work out how to get the needed patch that is mentioned by navigating the webcvs :) You should be able to use this to recompile the CentOS PHP package accordingly.

Hope this helps.

Col
 [2008-10-16 02:01 UTC] mike at Kogan dot org
Thanks Col - unfortunately after thrashing on this for a day I either have gotten libexpat built in and it hasn't worked, or my efforts to build it into Apache2 have not worked. How can I tell once I've rebuilt apache whether it's in or not? Will it show up on phpinfo? I see libexpat.so on the system and configured apache --with-expat=builtin and tried using the expat on the system but I'm not sure if it's actually getting there. Sorry in advance if this is not the place to ask such a question but googling libexpat has not been fruitful.
 [2008-10-16 14:56 UTC] mike at kogan dot org
Nevermind we got it - libexpat is in and workaround is fine. Thanks again and happy coding!
 [2008-10-17 13:40 UTC] jorton@php.net
So far as I can tell the only problem here is that libxml2 is bailing in parser.c:xmlParseReference() because the ctxt->wellFormed flag is cleared.  It is set to zero by compat.c, which seems simply wrong; xmlCreate*ParserContext initialize it to non-zero.

If I apply this patch it works fine:

http://people.apache.org/~jorton/php-5.2.6-xmlwformed.patch

Rob?
 [2008-10-17 14:08 UTC] rrichards@php.net
Changing the flag fixes internally defined entities, but breaks the rest 
of the entity handling.
 [2008-10-22 11:39 UTC] markus dot gevers at contenit dot de
Hallo,

is there a solution yet?

I have the same problem on Fedora Core 9.
I also have this problem using libxml2-2.6.32

Can anyone help me?

Best regards,
Markus
 [2008-10-24 20:47 UTC] sanepit at o2 dot pl
I have the same problem on Gentoo with new libxml2-2.6.32
In moodle platform this problem disable to restore courses backup.
 [2008-10-25 18:12 UTC] alykhanii at yahoo dot com
This bug has messed up the XMLRPC upload capabilities in Wordpress hopefully a solution is coming soon.
 [2008-10-30 18:19 UTC] phpbugs at hm2k dot org
This problem is also documented here...

http://social.microsoft.com/Forums/en-US/writerbeta/thread/62ad697b-ed19-4b0b-ae6a-32bec06b142b/
 [2008-10-31 14:45 UTC] sunil at truesparrow dot com
I am also facing the same problem on redhat 5 server.I have php 5.2.6 and libxml 2.7.2


Any updates on the bug?


Thanks,
Sunil
 [2008-11-03 23:16 UTC] hjthring at lavabit dot com
any suggestions on how one could go about upgrading cms code to avoid this issue ??

Hayden.
 [2008-11-04 20:35 UTC] rolysatch at hotmail dot com
yes i've been struggling with this issue for some time, has anyone discovered a solution yet?
 [2008-11-04 21:00 UTC] rrichards@php.net
This is still being worked on.
Currently the only options are to build PHP with libxml2 <= 2.6.32 or 
build ext/xml with expat.
 [2008-11-05 16:13 UTC] jorton@php.net
Rob, do you have a test case for the regression caused by my test patch, so I can look at that further?  
 [2008-11-05 16:30 UTC] rrichards@php.net
just use the example in the manual. Your change simply gets predefined 
entities working but breaks all other cases. I'm currently looking at 
getting the change in libxml2 that caused this reverted. The problem 
stems from the magic needed to emulate expat behavior as its not native 
in libxml so non-standard use of the libxml sax hooks was needed.
 [2008-11-20 22:37 UTC] rcasagraude at interfaces dot fr
Confirm : build xml against libexpat workaround this bug
 [2008-11-21 15:31 UTC] ajay12006 at gmail dot com
I already have libxml 2.7.2

Can anyone help me how to "build PHP with libxml2 <= 2.6.32 or 
build ext/xml with expat."? Any link on how to so it?

Thanks.
 [2008-11-21 15:32 UTC] phpbugs at colin dot guthr dot ie
To build with expat just read my earlier posts on this bug where I've shown how we did this in our recent Mandriva 2009 release.
 [2008-12-03 17:46 UTC] hoffie at gentoo dot org
What is the current status of a fix (in contrast to a work around)? This issue breaks lots of web apps (Wordpress as we heard here, Typo3 as we heard in our bug tracker at Gentoo) and considering that the last release of 5.2.x is supposed to be released tomorrow, this sounds rather urgent...

Downstream bug for Gentoo: http://bugs.gentoo.org/show_bug.cgi?id=249703
(Gentoo users: Please report such bugs to us (as well), especially in case of such library interaction issues)
 [2008-12-07 13:08 UTC] rolysatch at hotmail dot com
i did apply a patch to php 5.2.6 and got this bug with libxml2 fixed, but i just upgraded php to 5.2.7 cli (where i thought this big had been fixed) and i've got the same problem back again with angled brackets being stripped. anyone else upgraded and still got this issue?
 [2008-12-14 22:17 UTC] alykhanii at yahoo dot com
This bug has been open for 4 months now.  Any chance of this getting fixed anytime soon?
 [2008-12-17 00:09 UTC] joshua at joshuascott dot net
Any update on resolving this issue?  All of my web based RSS aggregator sites are broken, I am unable to post using any desktop clients and none of the workarounds seem to work or can't be implemented due to various reasons.  

Please advise.  This bug has been open for a long time and it seems to be affecting a lot of people.
 [2008-12-17 00:49 UTC] scottmac@php.net
Recompile with libexpat or downgrade libxml. One of those will sort you out.

A change to libxml will be in the next release, though the patch isn't in their repository yet.
 [2008-12-17 00:56 UTC] rrichards@php.net
Does no one pay attention to the comments? The fix for this needs to 
happen on the libxml2 layer. The next version of libxml2 should resolve 
this and this bug is being kept open only in case some modification to 
the extension also needs to occur with the fix.
 [2008-12-17 15:18 UTC] valli at icsurselva dot ch
Where can I find the libxml2 patch?
Does someone have a link to the libxml2 bug reporting tool
which describes this issue?
 [2008-12-25 02:41 UTC] gordon at kanazawa-gu dot ac dot jp
This issue also affects Moodle backup+restore and HotPot module
http://tracker.moodle.org/browse/MDL-17136
 [2008-12-28 03:55 UTC] alex at magnet dot ru
4 month!!! This is not serious?!
 [2008-12-28 13:37 UTC] phpbugs at colin dot guthr dot ie
While I do not doubt that this is an annoying bug, there are very simple and clear workarounds. I'm really not seeing why everyone is so upset about this. I reported the bug and we fixed the Mandriva package almost immediately with the work around and I'm not aware of any fallout from that change.
 [2008-12-29 17:02 UTC] jeffrey dot roberts at ibsgroup dot org
For those as challenged as me at determining how to either downgrade libxml2 or recompile with libexpat while using cPANEL (easyapache), here is a link to the details.  I confirm that after recompiling it is a workaround for the problem.

http://blog.code-head.com/fixing-libxml-php-bug-and-issues-with-html-entities-libexpat

First you will need to find out where libexpat is located on your server, it??s probably here:
/usr/lib

To find out for sure, open this folder (/usr/lib) and look for the file:
libexpat.so

If you can??t find it, log in as root via SSH and enter:
whereis libexpat.so

This should list the folder in which libexpat is located.

After all this you will need to compile PHP to use libexpat instead of libxml, so go to:
/var/cpanel/easy/apache/rawopts/

And create a file and name it ??all_php5?? (no quotes), if there is a file with this name edit it and add these lines to the end of it:
?Cwith-expat=builtin
?Cwith-libexpat-dir=/usr/lib

(lines start with two dashes ??-?? that are not showing up here for some reason)
Remember that depending on where libexpat is located on your server you might need to edit the second line.

Now compile Apache and everything should work fine!
 [2008-12-31 00:56 UTC] mark at mcclusky dot com
I understand that there are workarounds to this bug, but for those of us on shared hosts that we can't recompile, this is a colossally frustrating state of affairs.

Does anyone in PHP land have any timeline for a fix?
 [2008-12-31 13:37 UTC] phpbugs at colin dot guthr dot ie
If you are on a shared host with a fixed PHP version then really the host should be responsible for deploying a well packaged version of PHP. If they are not doing that it is the host that is at fault and you should raise the issue with them.
 [2008-12-31 14:35 UTC] hougiwro at guerrillamail dot org
Why not just fix the bug so that existing installs can pick up a standard update without having to rebuild the program.

Not everyone using PHP is necessarily comfortable with recompiling it in order to implement a workaround for a bug.
 [2008-12-31 15:22 UTC] scottmac@php.net
Guys please READ the report, this is a bug in libxml NOT a bug in PHP.
 [2009-01-01 19:31 UTC] alex at peoples dot ru
Thanks for advice, but I'm not guru in the Linux, as I haven't cpanel on my server. I tried use 'yum remove' libxml2 and add new, but off course this is stupid and doesn't work. I liked Linux, as the easiest and powerful system, but now, I'm stock. I haven't any idea how I can remove libxml2 and build new system with old one. One idea - change system on Fedora 9, because FC 10 have the same bug with fucking libxml2. Sorry, I was at Data Center 8 hours and I had problem with servers with new system. I don't like updates now... they have bugs every where, and I'm tiered resolve this bugs. Sorry, Have a Happy New Year. I'll never ever will update my systems less when half year.
 [2009-01-01 20:09 UTC] phpbugs at colin dot guthr dot ie
If the Fedora packages do not work then this is a RedHat packaging problem and you should complain to them/open a bug etc. etc.

Like I say, in Mandriva we made sure we provided packages that worked because they were compiled with expat.
 [2009-01-02 23:03 UTC] geoffers+phpbugs at gmail dot com
What is the recommended advice for PHP software that relies upon the XML 
extension? It'd be easier to say that libxml2.7.0?2.7.2 wasn't supported 
if it weren't for the fact that I've had at least one user come who had 
LIBXML_VERSION equal to 20632 with this issue ? we can't just add a 
LIBXML_VERSION based workaround, not just because the constant doesn't 
exist on 4.3.0, but also because it is seemingly isn't reliable.
 [2009-01-03 04:03 UTC] david+phpbugs at midrange dot com
Ok, I'm going to try and rebuild the Fedora 8 source RPM to avoid the libxml2 bug ... but I'm not all that familiar with how PHP is built ... and could use a pointer or two on what to change on the configure command line.

Any suggestions?
 [2009-01-11 12:06 UTC] rrichards@php.net
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.

This fix also requires the soon to be released libxml2-2.7.3 or higher 
when using 2.7.x.
 [2010-04-19 00:05 UTC] nick dot phillips at otago dot ac dot nz
Just FYI - I have been seeing what would appear to be this issue with libxml2 2.7.6 (with Moodle), so it seems that "2.7.3 or higher" doesn't cut it. Reverting to 2.6.32 solved the problem for me.
 [2010-07-07 23:17 UTC] i_cypher at hotmail dot com
I am seeing this error in 2.7.3 as well, so the fix does not seem to be working.
 [2011-01-04 19:50 UTC] lwc at mailmetrash dot com
It still happens to me even in libxml v2.7.7.

Please take it seriously. At least mention in what scenarios does it happen and does it have a link in libxml's own bug tracker.
 [2011-01-04 20:04 UTC] rrichards@php.net
To hopefully close this once and for all:
Compile PHP (5.2.9 or higher) against the newer libxml (2.7.3 or higher).
PHP code change:
http://svn.php.net/viewvc/php/php-src/branches/PHP_5_2/ext/xml/compat.c?
r1=272374&r2=273286
Libxml code change:
http://svn.gnome.org/viewvc/libxml2/trunk/parser.c?r1=3803&r2=3807
 [2011-01-04 20:55 UTC] lwc at mailmetrash dot com
Again, I use libxml v2.7.7 (with PHP v5.2.14).
 [2011-01-04 21:54 UTC] lwc at mailmetrash dot cm
Regarding your e-mail message, please let's continue this discussion publicly. It's established that this bug can happen with any (HTML) input, so there's no point exposing my own. Check out http://www.google.com/search?q=libxml+%2Bbrackets for tons of example code.

But most importantly, is it possible that all this time no one ever posted a bug about it in libxml's own tracker? If so, I think I'll post something there soon.
 [2011-01-04 22:08 UTC] rrichards@php.net
HTML is not XML so non issue as the sax parser is not meant to handle HTML
If PHP is not linked against 2.7.3 or higher it wont work either.
So back to "Compile PHP (5.2.9 or higher) against the newer libxml (2.7.3 or 
higher)" as most of those search results even tell you. Without phpinfo and your 
sample code I would say the problem is on your end
 [2011-03-24 02:21 UTC] david at moodle dot com
In Moodle (http:/moodle.org), this issue causes troubles in several areas, including SSO support via MNet. Today, I was debugging some SSO problem in Moodle/Mahara and tracked it down to this xml_parse() issue. I have published a patch with workaround for the SSO in Moodle at http://moodle.org/mod/forum/discuss.php?d=101459#p753158

In this particular case, the combination was PHP 5.2.13-pl0-gentoo and libxml 2.7.3 (note that this combo is supposed to be fixed already).
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Dec 03 17:01:29 2024 UTC