PHP Bugs  
php.net | support | documentation | report a bug | advanced search | search howto | statistics | login

go to bug id or search bugs for  

Bug #45996 libxml2 2.7.1 causes breakage with character data in xml_parse()
Submitted:4 Sep 2008 5:29pm UTC Modified: 11 Jan 12:06pm UTC
From:phpbugs at colin dot guthr dot ie Assigned to:rrichards
Status:Closed Category:XML related
Version:5.2.6 OS:Mandriva Linux
Votes:166 Avg. Score:4.8 ± 0.5 Reproduced:148 of 151 (98.0%)
Same Version:122 (82.4%) Same OS:16 (10.8%)
View/Vote Developer Edit Submission

[4 Sep 2008 5:29pm UTC] phpbugs at colin dot guthr dot ie
Description:
------------
With libxml2 2.7.1, When using the expat type xml parsing routines in
PHP, the characater data seems to silently drop any encoded text e.g.
> < and friends.

Please see Mandriva bug for details:
https://qa.mandriva.com/show_bug.cgi?id=43486

And also please note the thread on the libxml mailing list:
http://thread.gmane.org/gmane.comp.gnome.lib.xml.general/14610

And most notably the reply to the above thread:
<quote>
Can you report this as a PHP bug? It looks like some really old hack 
code in the PHP extension in order to mimic some specific expat 
functionality. The behavior change you see though resulting from a code
changes in libxml2 is really due to the hackish code in the extension
doing things it wasnt meant to be doing.
</quote>

Reproduce code:
---------------
Please see this code:
https://qa.mandriva.com/attachment.cgi?id=10757

Expected result:
----------------
<
foo
>
wibble
<
/foo
>

Actual result:
--------------
foo
wibble
/foo

[6 Sep 2008 3:43pm UTC] jani@php.net
Assigned to the maintainer (Rob, don't forget to change status too when
you assign something to yourself :)
[9 Sep 2008 11:06pm UTC] phpbugs at colin dot guthr dot ie
Comments by Daniel Veillard on the libxml ML:

  The only thing I can think of is that libxml2 doesn't anymore ask
though a SAX callback when looking for entities references if they
are in the predefined set. This comes in essence by an old decision
from the XML working group stating that user definition for those 5
entities could not override the default predefined ones. So I guess
that change is logical. Now what is done on top of SAX to result
in that bug, I don't really know  :-\
[7 Oct 2008 11:19am UTC] ptn at post dot cz
this bug seems to be fixed in libxm2-2.7.2

http://svn.gnome.org/viewvc/libxml2?view=revision&revision=3798
[8 Oct 2008 9:18am UTC] uraes at hot dot ee
just tried libxml2-2.7.2 and 5.2.6-pl7-gentoo and it is still broken:

Example PHP code:
<?
$data="<?xml version = '1.0' encoding = 'UTF-8'?>
<rss version=\"2.0\" >
  <channel>
    <item>
      <description>&lt;a
href=&quot;http://www.google.com&quot;>Google&lt;/a></description>
    </item>
  </channel>
</rss>
";

$parser = xml_parser_create('UTF-8');
xml_parser_set_option($parser, XML_OPTION_SKIP_WHITE, 1);
xml_parse_into_struct($parser, $data, $vals, $index);
xml_parser_free($parser);

echo "<pre>";

echo "<b>Original XML:</b><br>".htmlentities($data);

echo "<br><br><b>Parsed struct:</b><br>";
print_r($vals);
?>

.. parsed result is "a href=http://www.google.com>Google/a>"
[8 Oct 2008 9:50am UTC] phpbugs at colin dot guthr dot ie
Yes, I suspect that the comments left by ptn at post dot cz are
incorrect when they say it is fixed in libxml. rrichards has given a
very complete explanation of the problem and it is more fundamental than
a simple bug.

Compiling PHP with libexpat is the correct workaround for now.
[15 Oct 2008 12:04am UTC] mike at kogan dot org
I also have run into this - we had some legacy php code on the
xml_parser that was fine on some centos 4 servers with php4 and 5
running apache 1.3. We've been debugging this failure for a day now on
our new centos 5 server running php5 and libxml2 2.7.2, and we confirm
the same problem. The characterHandler is not called for the known
entities so scripts depending on this (rss feed converters etc) emit
flawed html. I agree there's much better ways to parse XML but this is
legacy stuff thats somewhat pervasive and we didn;t choose what these
folks used for their apps.

I'd love to rebuild their server with an older libxml2 but am not sure
how to go backwards without causing some other problem. Customer has
cpanel/whm and all that hooey and I'd rather not create a mess on their
new server.

Hope ya'll fix this soon as it is an issue on the cpanel folks that have
2.7.2 in their stable branch for centos 5 that is being spread by their
updater.

If someone can give me a pointer that a straightup build and install of
the old release code wont make things worse I'll take a crack at moving
their server back.
[15 Oct 2008 9:02am UTC] phpbugs at colin dot guthr dot ie
Mike, it's fairly easy to recompile PHP with the libexpat library for
the legacy XML parsing functions while keeping libxml2 for the more
modern ones.

We did that in the Mandriva package for our 2009.0 release after I
reported the bug.

See the SPEC file here:
http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/updates/2009.0/php/c
urrent/SPECS/php.spec?revision=291141&view=markup

The particular change that worked around it is here:
http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/updates/2009.0/php/c
urrent/SPECS/php.spec?r1=278891&r2=281822

I'm sure you can work out how to get the needed patch that is mentioned
by navigating the webcvs :) You should be able to use this to recompile
the CentOS PHP package accordingly.

Hope this helps.

Col
[16 Oct 2008 2:01am UTC] mike at Kogan dot org
Thanks Col - unfortunately after thrashing on this for a day I either
have gotten libexpat built in and it hasn't worked, or my efforts to
build it into Apache2 have not worked. How can I tell once I've rebuilt
apache whether it's in or not? Will it show up on phpinfo? I see
libexpat.so on the system and configured apache --with-expat=builtin and
tried using the expat on the system but I'm not sure if it's actually
getting there. Sorry in advance if this is not the place to ask such a
question but googling libexpat has not been fruitful.
[16 Oct 2008 2:56pm UTC] mike at kogan dot org
Nevermind we got it - libexpat is in and workaround is fine. Thanks
again and happy coding!
[17 Oct 2008 1:40pm UTC] jorton@php.net
So far as I can tell the only problem here is that libxml2 is bailing in
parser.c:xmlParseReference() because the ctxt->wellFormed flag is
cleared.  It is set to zero by compat.c, which seems simply wrong;
xmlCreate*ParserContext initialize it to non-zero.

If I apply this patch it works fine:

http://people.apache.org/~jorton/php-5.2.6-xmlwformed.patch

Rob?
[17 Oct 2008 2:08pm UTC] rrichards@php.net
Changing the flag fixes internally defined entities, but breaks the rest

of the entity handling.
[22 Oct 2008 11:39am UTC] markus dot gevers at contenit dot de
Hallo,

is there a solution yet?

I have the same problem on Fedora Core 9.
I also have this problem using libxml2-2.6.32

Can anyone help me?

Best regards,
Markus
[24 Oct 2008 8:47pm UTC] sanepit at o2 dot pl
I have the same problem on Gentoo with new libxml2-2.6.32
In moodle platform this problem disable to restore courses backup.
[25 Oct 2008 6:12pm UTC] alykhanii at yahoo dot com
This bug has messed up the XMLRPC upload capabilities in Wordpress
hopefully a solution is coming soon.
[30 Oct 2008 6:19pm UTC] phpbugs at hm2k dot org
This problem is also documented here...

http://social.microsoft.com/Forums/en-US/writerbeta/thread/62ad697b-ed19
-4b0b-ae6a-32bec06b142b/
[31 Oct 2008 2:45pm UTC] sunil at truesparrow dot com
I am also facing the same problem on redhat 5 server.I have php 5.2.6
and libxml 2.7.2

Any updates on the bug?

Thanks,
Sunil
[3 Nov 2008 11:16pm UTC] hjthring at lavabit dot com
any suggestions on how one could go about upgrading cms code to avoid
this issue ??

Hayden.
[4 Nov 2008 8:35pm UTC] rolysatch at hotmail dot com
yes i've been struggling with this issue for some time, has anyone
discovered a solution yet?
[4 Nov 2008 9:00pm UTC] rrichards@php.net
This is still being worked on.
Currently the only options are to build PHP with libxml2 <= 2.6.32 or 
build ext/xml with expat.
[5 Nov 2008 4:13pm UTC] jorton@php.net
Rob, do you have a test case for the regression caused by my test patch,
so I can look at that further?  
[5 Nov 2008 4:30pm UTC] rrichards@php.net
just use the example in the manual. Your change simply gets predefined 
entities working but breaks all other cases. I'm currently looking at 
getting the change in libxml2 that caused this reverted. The problem 
stems from the magic needed to emulate expat behavior as its not native

in libxml so non-standard use of the libxml sax hooks was needed.
[20 Nov 2008 10:37pm UTC] rcasagraude at interfaces dot fr
Confirm : build xml against libexpat workaround this bug
[21 Nov 2008 3:31pm UTC] ajay12006 at gmail dot com
I already have libxml 2.7.2

Can anyone help me how to "build PHP with libxml2 <= 2.6.32 or 
build ext/xml with expat."? Any link on how to so it?

Thanks.
[21 Nov 2008 3:32pm UTC] phpbugs at colin dot guthr dot ie
To build with expat just read my earlier posts on this bug where I've
shown how we did this in our recent Mandriva 2009 release.
[3 Dec 2008 5:46pm UTC] hoffie at gentoo dot org
What is the current status of a fix (in contrast to a work around)? This
issue breaks lots of web apps (Wordpress as we heard here, Typo3 as we
heard in our bug tracker at Gentoo) and considering that the last
release of 5.2.x is supposed to be released tomorrow, this sounds rather
urgent...

Downstream bug for Gentoo:
http://bugs.gentoo.org/show_bug.cgi?id=249703
(Gentoo users: Please report such bugs to us (as well), especially in
case of such library interaction issues)
[7 Dec 2008 1:08pm UTC] rolysatch at hotmail dot com
i did apply a patch to php 5.2.6 and got this bug with libxml2 fixed,
but i just upgraded php to 5.2.7 cli (where i thought this big had been
fixed) and i've got the same problem back again with angled brackets
being stripped. anyone else upgraded and still got this issue?
[14 Dec 2008 10:17pm UTC] alykhanii at yahoo dot com
This bug has been open for 4 months now.  Any chance of this getting
fixed anytime soon?
[17 Dec 2008 12:09am UTC] joshua at joshuascott dot net
Any update on resolving this issue?  All of my web based RSS aggregator
sites are broken, I am unable to post using any desktop clients and none
of the workarounds seem to work or can't be implemented due to various
reasons.  

Please advise.  This bug has been open for a long time and it seems to
be affecting a lot of people.
[17 Dec 2008 12:49am UTC] scottmac@php.net
Recompile with libexpat or downgrade libxml. One of those will sort you
out.

A change to libxml will be in the next release, though the patch isn't
in their repository yet.
[17 Dec 2008 12:56am UTC] rrichards@php.net
Does no one pay attention to the comments? The fix for this needs to 
happen on the libxml2 layer. The next version of libxml2 should resolve

this and this bug is being kept open only in case some modification to 
the extension also needs to occur with the fix.
[17 Dec 2008 3:18pm UTC] valli at icsurselva dot ch
Where can I find the libxml2 patch?
Does someone have a link to the libxml2 bug reporting tool
which describes this issue?
[25 Dec 2008 2:41am UTC] gordon at kanazawa-gu dot ac dot jp
This issue also affects Moodle backup+restore and HotPot module
http://tracker.moodle.org/browse/MDL-17136
[28 Dec 2008 3:55am UTC] alex at magnet dot ru
4 month!!! This is not serious?!
[28 Dec 2008 1:37pm UTC] phpbugs at colin dot guthr dot ie
While I do not doubt that this is an annoying bug, there are very simple
and clear workarounds. I'm really not seeing why everyone is so upset
about this. I reported the bug and we fixed the Mandriva package almost
immediately with the work around and I'm not aware of any fallout from
that change.
[29 Dec 2008 5:02pm UTC] jeffrey dot roberts at ibsgroup dot org
For those as challenged as me at determining how to either downgrade
libxml2 or recompile with libexpat while using cPANEL (easyapache), here
is a link to the details.  I confirm that after recompiling it is a
workaround for the problem.

http://blog.code-head.com/fixing-libxml-php-bug-and-issues-with-html-ent
ities-libexpat

First you will need to find out where libexpat is located on your
server, it¡¯s probably here:
/usr/lib

To find out for sure, open this folder (/usr/lib) and look for the
file:
libexpat.so

If you can¡¯t find it, log in as root via SSH and enter:
whereis libexpat.so

This should list the folder in which libexpat is located.

After all this you will need to compile PHP to use libexpat instead of
libxml, so go to:
/var/cpanel/easy/apache/rawopts/

And create a file and name it ¡°all_php5¡å (no quotes), if there is a
file with this name edit it and add these lines to the end of it:
¨Cwith-expat=builtin
¨Cwith-libexpat-dir=/usr/lib

(lines start with two dashes ¡°-¡± that are not showing up here for some
reason)
Remember that depending on where libexpat is located on your server you
might need to edit the second line.

Now compile Apache and everything should work fine!
[31 Dec 2008 12:56am UTC] mark at mcclusky dot com
I understand that there are workarounds to this bug, but for those of us
on shared hosts that we can't recompile, this is a colossally
frustrating state of affairs.

Does anyone in PHP land have any timeline for a fix?
[31 Dec 2008 1:37pm UTC] phpbugs at colin dot guthr dot ie
If you are on a shared host with a fixed PHP version then really the
host should be responsible for deploying a well packaged version of PHP.
If they are not doing that it is the host that is at fault and you
should raise the issue with them.
[31 Dec 2008 2:35pm UTC] hougiwro at guerrillamail dot org
Why not just fix the bug so that existing installs can pick up a
standard update without having to rebuild the program.

Not everyone using PHP is necessarily comfortable with recompiling it in
order to implement a workaround for a bug.
[31 Dec 2008 3:22pm UTC] scottmac@php.net
Guys please READ the report, this is a bug in libxml NOT a bug in PHP.
[1 Jan 7:31pm UTC] alex at peoples dot ru
Thanks for advice, but I'm not guru in the Linux, as I haven't cpanel on
my server. I tried use 'yum remove' libxml2 and add new, but off course
this is stupid and doesn't work. I liked Linux, as the easiest and
powerful system, but now, I'm stock. I haven't any idea how I can remove
libxml2 and build new system with old one. One idea - change system on
Fedora 9, because FC 10 have the same bug with fucking libxml2. Sorry, I
was at Data Center 8 hours and I had problem with servers with new
system. I don't like updates now... they have bugs every where, and I'm
tiered resolve this bugs. Sorry, Have a Happy New Year. I'll never ever
will update my systems less when half year.
[1 Jan 8:09pm UTC] phpbugs at colin dot guthr dot ie
If the Fedora packages do not work then this is a RedHat packaging
problem and you should complain to them/open a bug etc. etc.

Like I say, in Mandriva we made sure we provided packages that worked
because they were compiled with expat.
[2 Jan 11:03pm UTC] geoffers+phpbugs at gmail dot com
What is the recommended advice for PHP software that relies upon the XML

extension? It'd be easier to say that libxml2.7.0–2.7.2 wasn't supported

if it weren't for the fact that I've had at least one user come who had

LIBXML_VERSION equal to 20632 with this issue — we can't just add a 
LIBXML_VERSION based workaround, not just because the constant doesn't 
exist on 4.3.0, but also because it is seemingly isn't reliable.
[3 Jan 4:03am UTC] david+phpbugs at midrange dot com
Ok, I'm going to try and rebuild the Fedora 8 source RPM to avoid the
libxml2 bug ... but I'm not all that familiar with how PHP is built ...
and could use a pointer or two on what to change on the configure
command line.

Any suggestions?
[11 Jan 12:06pm UTC] rrichards@php.net
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.

This fix also requires the soon to be released libxml2-2.7.3 or higher 
when using 2.7.x.

RSS feed | show source 

PHP Copyright © 2001-2009 The PHP Group
All rights reserved.
Last updated: Sat Nov 21 10:30:49 2009 UTC