| Bug #44367 | DOMDocument::baseURI parsing is out of whack | ||||
|---|---|---|---|---|---|
| Submitted: | 8 Mar 2008 5:09am UTC | Modified: | 12 Mar 2008 10:30pm UTC | ||
| From: | daniel dot oconnor at gmail dot com | Assigned to: | rrichards | ||
| Status: | Bogus | Category: | DOM XML related | ||
| Version: | 5.2.5 | OS: | Windows | ||
[8 Mar 2008 5:09am UTC] daniel dot oconnor at gmail dot com
[8 Mar 2008 10:20pm UTC] johannes@php.net
Rob, please take a look
[10 Mar 2008 2:09pm UTC] rrichards@php.net
Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://www.php.net/manual/ and the instructions on how to report a bug at http://bugs.php.net/how-to-report.php Don't know about GRDDL, but for DOM trees, base uri of a DOMDocument is the URI its loaded from (or for memory based tree, the current dir). You need to check on the document element to get the base uri you are looking for.
[11 Mar 2008 12:03am UTC] daniel dot oconnor at gmail dot com
See http://www.w3.org/TR/grddl/#base_misc & http://www.apps.ietf.org/rfc/rfc3986.html#sec-5.1 The way to determine baseURI is: 1. Look for it on the root document element (HTML - <base>, XML - <foo xml:base=""> 2. Couldn't find that? Use the URL we retrieved the document with * And make sure we follow redirects! 3. Couldn't find that? Application specific (but we don't really have a setBaseURI()) So, condition #1 is broken in 5.2.5 when you do: <?php $doc = DOMDocument::load('http://www.w3.org/2001/sw/grddl-wg/td/inline-rdf6.xml '); var_dump($doc->baseURI); //Expected http://wwww.example.org/ produces: string(53) "http://www.w3.org/2001/sw/grddl-wg/td/inline-rdf6.xml"
[12 Mar 2008 5:16pm UTC] rrichards@php.net
still bogus as what you are describing pertains to GRDDL only not DOM, so when working with GRDDL and DOm you need to check base uri of the document element, not the DOMDocument. DOM determines base uri using the xml base spec. "The base URI of a document entity or an external entity is determined by RFC 2396 rules, namely, that the base URI is the URI used to retrieve the document entity or external entity." This is not just how it is implemented in PHP as the other major DOM parsers implement it the same way,
[12 Mar 2008 10:30pm UTC] daniel dot oconnor at gmail dot com
:S I hate being pushy / argumentitive, sorry if its coming across that way. RFC 2396 is "Uniform Resource Identifiers (URI): Generic Syntax" Section 5.1. is "Establishing a Base URI" describes what I've been trying to say, probably a little clearer. XML Base spec @ http://www.w3.org/TR/xmlbase/#rfc2396 says: Determine a baseURI: 1. The base URI is embedded in the document's content. 2. The base URI is that of the encapsulating entity (message, document, or none). 3. The base URI is the URI used to retrieve the entity. 4. The base URI is defined by the context of the application. > This is not just how it is implemented in PHP as the other major DOM parsers implement it the same way ... and that's why the xml:base GRDDL tests were written - to clarify correct behaviour / check implementations.
