php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #53950 Add a way to configure where libxml searches for Catalog Files
Submitted: 2011-02-07 17:16 UTC Modified: 2017-10-24 06:14 UTC
Votes:105
Avg. Score:3.2 ± 0.5
Reproduced:11 of 105 (10.5%)
Same Version:8 (72.7%)
Same OS:7 (63.6%)
From: gordon at onlinehome dot de Assigned:
Status: Open Package: *XML functions
PHP Version: Irrelevant OS: any
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2011-02-07 17:16 UTC] gordon at onlinehome dot de
Description:
------------
Libxml can use catalog files to provide a local cache mechanism allowing to load the entities associated to public identifiers or remote resources. There is currently no way to configure the catalog file path from PHP. Configuring the path in libxml itself seems possible:

> The user can change the default catalog behaviour by redirecting queries to its own set of catalogs. This can be done by setting the XML_CATALOG_FILES environment variable to a list of catalogs, an empty one should deactivate loading the default /etc/xml/catalog default catalog.

It would be nice if PHP's libxml extension would provide a way to set the path somehow. This could be helpful when validating documents with remote System Identifiers, like any HTML DTD. Or simply to bundle files with an application.

Related Resources:

- http://xmlsoft.org/catalog.html
- http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic
- http://bugs.php.net/48080
- http://bugs.php.net/32426


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2011-02-08 19:14 UTC] cataphract@php.net
I know very little of libxml2, but as far as I infer from http://xmlsoft.org/catalog.html and from the code in catalog.c, there is no way to specify a global catalog in a thread-local manner. This, by itself, makes it inviable to write a PHP function that allows replacing the default catalog unless some (probably expensive, since a catalog file must be read) on request startup/shutdown is used.

libxml2 supports per-document catalogs, but from what I see, your document must contain an oasis-xml-catalog processing instruction, like this:

<?xml version="1.0"?>
<?oasis-xml-catalog catalog="http://example.com/catalog.xml"?>
<!DOCTYPE doc PUBLIC "-//Example//DTD Document V1.0//EN"
                     "http://www.example.com/schema/doc.dtd">

There is another alternative, which is using an external entity loader (see http://xmlsoft.org/html/libxml-parser.html#xmlSetExternalEntityLoader ), but this is less convenient, though probably we could also expose some functions to deal with catalogs for use in the user-supplied callback.

I could implement this, but I'm hopping someone else with more libxml2 knowledge could tell if my analysis is correct.
 [2011-02-08 19:18 UTC] cataphract@php.net
> This, by itself, makes it inviable to write a PHP function that allows
> replacing the default catalog unless some (probably expensive, since a
> catalog file must be read) on request startup/shutdown is used.

The unless part doesn't actually make sense in the multi-threading versions of PHP.
 [2011-02-08 19:27 UTC] rrichards@php.net
-Assigned To: +Assigned To: rrichards
 [2011-02-08 19:27 UTC] rrichards@php.net
Assign to self as I asked him to open this
 [2011-12-06 17:34 UTC] Marko dot Voss at fiz-Karlsruhe dot de
> The user can change the default catalog behaviour by redirecting queries to its own set of catalogs. This can be done by setting the XML_CATALOG_FILES environment variable to a list of catalogs, an empty one should deactivate loading the default /etc/xml/catalog default catalog.

If I understand this right, you can use this code to set your own catalog:

setenv("XML_CATALOG_FILES=/path/to/catalog.xml");

I tried that, because I have to use xml catalog somehow, without overwriting the http stream wrapper, which is a non-threadsafe way to do this. However, the catalog.xml I specified using setenv seems not to be used by libxml2.
 [2013-01-18 21:32 UTC] hanskrentel at yahoo dot de
setenv("XML_CATALOG_FILES=/path/to/catalog.xml");

should normally work (depending which dialect that as) if it's set before PHP 
starts and loads the extension and the path to the catalog XML file does not 
contain any spaces (which has a special meaning).

See as well:

http://www.cafeconleche.org/books/effectivexml/chapters/47.html
http://www.chlab.ch/blog/archives/php/soap/cache-soap-envelope-schema-schema-
validation
http://wp.me/pLEEp-nlj

To change things at runtime, make use of libxml_set_external_entity_loader and 
implement a catalog parser your own. You then can use catalogs as you wish at 
runtime.
 [2017-10-24 06:14 UTC] kalle@php.net
-Status: Assigned +Status: Open -Assigned To: rrichards +Assigned To:
 
PHP Copyright © 2001-2019 The PHP Group
All rights reserved.
Last updated: Sat Dec 14 18:01:24 2019 UTC