| Bug #45996 | libxml2 2.7.1 causes breakage with character data in xml_parse() | ||||
|---|---|---|---|---|---|
| Submitted: | 4 Sep 2008 5:29pm UTC | Modified: | 11 Jan 12:06pm UTC | ||
| From: | phpbugs at colin dot guthr dot ie | Assigned to: | rrichards | ||
| Status: | Closed | Category: | XML related | ||
| Version: | 5.2.6 | OS: | Mandriva Linux | ||
| Votes: | 166 | Avg. Score: | 4.8 ± 0.5 | Reproduced: | 148 of 151 (98.0%) |
| Same Version: | 122 (82.4%) | Same OS: | 16 (10.8%) | ||
[4 Sep 2008 5:29pm UTC] phpbugs at colin dot guthr dot ie
[6 Sep 2008 3:43pm UTC] jani@php.net
Assigned to the maintainer (Rob, don't forget to change status too when you assign something to yourself :)
[9 Sep 2008 11:06pm UTC] phpbugs at colin dot guthr dot ie
Comments by Daniel Veillard on the libxml ML: The only thing I can think of is that libxml2 doesn't anymore ask though a SAX callback when looking for entities references if they are in the predefined set. This comes in essence by an old decision from the XML working group stating that user definition for those 5 entities could not override the default predefined ones. So I guess that change is logical. Now what is done on top of SAX to result in that bug, I don't really know :-\
[7 Oct 2008 11:19am UTC] ptn at post dot cz
this bug seems to be fixed in libxm2-2.7.2 http://svn.gnome.org/viewvc/libxml2?view=revision&revision=3798
[8 Oct 2008 9:18am UTC] uraes at hot dot ee
just tried libxml2-2.7.2 and 5.2.6-pl7-gentoo and it is still broken:
Example PHP code:
<?
$data="<?xml version = '1.0' encoding = 'UTF-8'?>
<rss version=\"2.0\" >
<channel>
<item>
<description><a
href="http://www.google.com">Google</a></description>
</item>
</channel>
</rss>
";
$parser = xml_parser_create('UTF-8');
xml_parser_set_option($parser, XML_OPTION_SKIP_WHITE, 1);
xml_parse_into_struct($parser, $data, $vals, $index);
xml_parser_free($parser);
echo "<pre>";
echo "<b>Original XML:</b><br>".htmlentities($data);
echo "<br><br><b>Parsed struct:</b><br>";
print_r($vals);
?>
.. parsed result is "a href=http://www.google.com>Google/a>"
[8 Oct 2008 9:50am UTC] phpbugs at colin dot guthr dot ie
Yes, I suspect that the comments left by ptn at post dot cz are incorrect when they say it is fixed in libxml. rrichards has given a very complete explanation of the problem and it is more fundamental than a simple bug. Compiling PHP with libexpat is the correct workaround for now.
[15 Oct 2008 12:04am UTC] mike at kogan dot org
I also have run into this - we had some legacy php code on the xml_parser that was fine on some centos 4 servers with php4 and 5 running apache 1.3. We've been debugging this failure for a day now on our new centos 5 server running php5 and libxml2 2.7.2, and we confirm the same problem. The characterHandler is not called for the known entities so scripts depending on this (rss feed converters etc) emit flawed html. I agree there's much better ways to parse XML but this is legacy stuff thats somewhat pervasive and we didn;t choose what these folks used for their apps. I'd love to rebuild their server with an older libxml2 but am not sure how to go backwards without causing some other problem. Customer has cpanel/whm and all that hooey and I'd rather not create a mess on their new server. Hope ya'll fix this soon as it is an issue on the cpanel folks that have 2.7.2 in their stable branch for centos 5 that is being spread by their updater. If someone can give me a pointer that a straightup build and install of the old release code wont make things worse I'll take a crack at moving their server back.
[15 Oct 2008 9:02am UTC] phpbugs at colin dot guthr dot ie
Mike, it's fairly easy to recompile PHP with the libexpat library for the legacy XML parsing functions while keeping libxml2 for the more modern ones. We did that in the Mandriva package for our 2009.0 release after I reported the bug. See the SPEC file here: http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/updates/2009.0/php/c urrent/SPECS/php.spec?revision=291141&view=markup The particular change that worked around it is here: http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/updates/2009.0/php/c urrent/SPECS/php.spec?r1=278891&r2=281822 I'm sure you can work out how to get the needed patch that is mentioned by navigating the webcvs :) You should be able to use this to recompile the CentOS PHP package accordingly. Hope this helps. Col
[16 Oct 2008 2:01am UTC] mike at Kogan dot org
Thanks Col - unfortunately after thrashing on this for a day I either have gotten libexpat built in and it hasn't worked, or my efforts to build it into Apache2 have not worked. How can I tell once I've rebuilt apache whether it's in or not? Will it show up on phpinfo? I see libexpat.so on the system and configured apache --with-expat=builtin and tried using the expat on the system but I'm not sure if it's actually getting there. Sorry in advance if this is not the place to ask such a question but googling libexpat has not been fruitful.
[16 Oct 2008 2:56pm UTC] mike at kogan dot org
Nevermind we got it - libexpat is in and workaround is fine. Thanks again and happy coding!
[17 Oct 2008 1:40pm UTC] jorton@php.net
So far as I can tell the only problem here is that libxml2 is bailing in parser.c:xmlParseReference() because the ctxt->wellFormed flag is cleared. It is set to zero by compat.c, which seems simply wrong; xmlCreate*ParserContext initialize it to non-zero. If I apply this patch it works fine: http://people.apache.org/~jorton/php-5.2.6-xmlwformed.patch Rob?
[17 Oct 2008 2:08pm UTC] rrichards@php.net
Changing the flag fixes internally defined entities, but breaks the rest of the entity handling.
[22 Oct 2008 11:39am UTC] markus dot gevers at contenit dot de
Hallo, is there a solution yet? I have the same problem on Fedora Core 9. I also have this problem using libxml2-2.6.32 Can anyone help me? Best regards, Markus
[24 Oct 2008 8:47pm UTC] sanepit at o2 dot pl
I have the same problem on Gentoo with new libxml2-2.6.32 In moodle platform this problem disable to restore courses backup.
[25 Oct 2008 6:12pm UTC] alykhanii at yahoo dot com
This bug has messed up the XMLRPC upload capabilities in Wordpress hopefully a solution is coming soon.
[30 Oct 2008 6:19pm UTC] phpbugs at hm2k dot org
This problem is also documented here... http://social.microsoft.com/Forums/en-US/writerbeta/thread/62ad697b-ed19 -4b0b-ae6a-32bec06b142b/
[31 Oct 2008 2:45pm UTC] sunil at truesparrow dot com
I am also facing the same problem on redhat 5 server.I have php 5.2.6 and libxml 2.7.2 Any updates on the bug? Thanks, Sunil
[3 Nov 2008 11:16pm UTC] hjthring at lavabit dot com
any suggestions on how one could go about upgrading cms code to avoid this issue ?? Hayden.
[4 Nov 2008 8:35pm UTC] rolysatch at hotmail dot com
yes i've been struggling with this issue for some time, has anyone discovered a solution yet?
[4 Nov 2008 9:00pm UTC] rrichards@php.net
This is still being worked on. Currently the only options are to build PHP with libxml2 <= 2.6.32 or build ext/xml with expat.
[5 Nov 2008 4:13pm UTC] jorton@php.net
Rob, do you have a test case for the regression caused by my test patch, so I can look at that further?
[5 Nov 2008 4:30pm UTC] rrichards@php.net
just use the example in the manual. Your change simply gets predefined entities working but breaks all other cases. I'm currently looking at getting the change in libxml2 that caused this reverted. The problem stems from the magic needed to emulate expat behavior as its not native in libxml so non-standard use of the libxml sax hooks was needed.
[20 Nov 2008 10:37pm UTC] rcasagraude at interfaces dot fr
Confirm : build xml against libexpat workaround this bug
[21 Nov 2008 3:31pm UTC] ajay12006 at gmail dot com
I already have libxml 2.7.2 Can anyone help me how to "build PHP with libxml2 <= 2.6.32 or build ext/xml with expat."? Any link on how to so it? Thanks.
[21 Nov 2008 3:32pm UTC] phpbugs at colin dot guthr dot ie
To build with expat just read my earlier posts on this bug where I've shown how we did this in our recent Mandriva 2009 release.
[3 Dec 2008 5:46pm UTC] hoffie at gentoo dot org
What is the current status of a fix (in contrast to a work around)? This issue breaks lots of web apps (Wordpress as we heard here, Typo3 as we heard in our bug tracker at Gentoo) and considering that the last release of 5.2.x is supposed to be released tomorrow, this sounds rather urgent... Downstream bug for Gentoo: http://bugs.gentoo.org/show_bug.cgi?id=249703 (Gentoo users: Please report such bugs to us (as well), especially in case of such library interaction issues)
[7 Dec 2008 1:08pm UTC] rolysatch at hotmail dot com
i did apply a patch to php 5.2.6 and got this bug with libxml2 fixed, but i just upgraded php to 5.2.7 cli (where i thought this big had been fixed) and i've got the same problem back again with angled brackets being stripped. anyone else upgraded and still got this issue?
[14 Dec 2008 10:17pm UTC] alykhanii at yahoo dot com
This bug has been open for 4 months now. Any chance of this getting fixed anytime soon?
[17 Dec 2008 12:09am UTC] joshua at joshuascott dot net
Any update on resolving this issue? All of my web based RSS aggregator sites are broken, I am unable to post using any desktop clients and none of the workarounds seem to work or can't be implemented due to various reasons. Please advise. This bug has been open for a long time and it seems to be affecting a lot of people.
[17 Dec 2008 12:49am UTC] scottmac@php.net
Recompile with libexpat or downgrade libxml. One of those will sort you out. A change to libxml will be in the next release, though the patch isn't in their repository yet.
[17 Dec 2008 12:56am UTC] rrichards@php.net
Does no one pay attention to the comments? The fix for this needs to happen on the libxml2 layer. The next version of libxml2 should resolve this and this bug is being kept open only in case some modification to the extension also needs to occur with the fix.
[17 Dec 2008 3:18pm UTC] valli at icsurselva dot ch
Where can I find the libxml2 patch? Does someone have a link to the libxml2 bug reporting tool which describes this issue?
[25 Dec 2008 2:41am UTC] gordon at kanazawa-gu dot ac dot jp
This issue also affects Moodle backup+restore and HotPot module http://tracker.moodle.org/browse/MDL-17136
[28 Dec 2008 3:55am UTC] alex at magnet dot ru
4 month!!! This is not serious?!
[28 Dec 2008 1:37pm UTC] phpbugs at colin dot guthr dot ie
While I do not doubt that this is an annoying bug, there are very simple and clear workarounds. I'm really not seeing why everyone is so upset about this. I reported the bug and we fixed the Mandriva package almost immediately with the work around and I'm not aware of any fallout from that change.
[29 Dec 2008 5:02pm UTC] jeffrey dot roberts at ibsgroup dot org
For those as challenged as me at determining how to either downgrade libxml2 or recompile with libexpat while using cPANEL (easyapache), here is a link to the details. I confirm that after recompiling it is a workaround for the problem. http://blog.code-head.com/fixing-libxml-php-bug-and-issues-with-html-ent ities-libexpat First you will need to find out where libexpat is located on your server, it¡¯s probably here: /usr/lib To find out for sure, open this folder (/usr/lib) and look for the file: libexpat.so If you can¡¯t find it, log in as root via SSH and enter: whereis libexpat.so This should list the folder in which libexpat is located. After all this you will need to compile PHP to use libexpat instead of libxml, so go to: /var/cpanel/easy/apache/rawopts/ And create a file and name it ¡°all_php5¡å (no quotes), if there is a file with this name edit it and add these lines to the end of it: ¨Cwith-expat=builtin ¨Cwith-libexpat-dir=/usr/lib (lines start with two dashes ¡°-¡± that are not showing up here for some reason) Remember that depending on where libexpat is located on your server you might need to edit the second line. Now compile Apache and everything should work fine!
[31 Dec 2008 12:56am UTC] mark at mcclusky dot com
I understand that there are workarounds to this bug, but for those of us on shared hosts that we can't recompile, this is a colossally frustrating state of affairs. Does anyone in PHP land have any timeline for a fix?
[31 Dec 2008 1:37pm UTC] phpbugs at colin dot guthr dot ie
If you are on a shared host with a fixed PHP version then really the host should be responsible for deploying a well packaged version of PHP. If they are not doing that it is the host that is at fault and you should raise the issue with them.
[31 Dec 2008 2:35pm UTC] hougiwro at guerrillamail dot org
Why not just fix the bug so that existing installs can pick up a standard update without having to rebuild the program. Not everyone using PHP is necessarily comfortable with recompiling it in order to implement a workaround for a bug.
[31 Dec 2008 3:22pm UTC] scottmac@php.net
Guys please READ the report, this is a bug in libxml NOT a bug in PHP.
[1 Jan 7:31pm UTC] alex at peoples dot ru
Thanks for advice, but I'm not guru in the Linux, as I haven't cpanel on my server. I tried use 'yum remove' libxml2 and add new, but off course this is stupid and doesn't work. I liked Linux, as the easiest and powerful system, but now, I'm stock. I haven't any idea how I can remove libxml2 and build new system with old one. One idea - change system on Fedora 9, because FC 10 have the same bug with fucking libxml2. Sorry, I was at Data Center 8 hours and I had problem with servers with new system. I don't like updates now... they have bugs every where, and I'm tiered resolve this bugs. Sorry, Have a Happy New Year. I'll never ever will update my systems less when half year.
[1 Jan 8:09pm UTC] phpbugs at colin dot guthr dot ie
If the Fedora packages do not work then this is a RedHat packaging problem and you should complain to them/open a bug etc. etc. Like I say, in Mandriva we made sure we provided packages that worked because they were compiled with expat.
[2 Jan 11:03pm UTC] geoffers+phpbugs at gmail dot com
What is the recommended advice for PHP software that relies upon the XML extension? It'd be easier to say that libxml2.7.0–2.7.2 wasn't supported if it weren't for the fact that I've had at least one user come who had LIBXML_VERSION equal to 20632 with this issue — we can't just add a LIBXML_VERSION based workaround, not just because the constant doesn't exist on 4.3.0, but also because it is seemingly isn't reliable.
[3 Jan 4:03am UTC] david+phpbugs at midrange dot com
Ok, I'm going to try and rebuild the Fedora 8 source RPM to avoid the libxml2 bug ... but I'm not all that familiar with how PHP is built ... and could use a pointer or two on what to change on the configure command line. Any suggestions?
[11 Jan 12:06pm UTC] rrichards@php.net
This bug has been fixed in CVS. Snapshots of the sources are packaged every three hours; this change will be in the next snapshot. You can grab the snapshot at http://snaps.php.net/. Thank you for the report, and for helping us make PHP better. This fix also requires the soon to be released libxml2-2.7.3 or higher when using 2.7.x.
