php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #65364 In doc not loaded from a URL, baseURI should still be a real URI
Submitted: 2013-07-31 05:28 UTC Modified: 2013-07-31 05:42 UTC
From: mike at skew dot org Assigned:
Status: Open Package: DOM XML related
PHP Version: 5.4.17 OS: FreeBSD 8.4-RELEASE
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: mike at skew dot org
New email:
PHP Version: OS:

 

 [2013-07-31 05:28 UTC] mike at skew dot org
Description:
------------
When loading an XML document from memory (string, file, whatever), such as via 
DOMDocument::loadXml(), PHP tells libxml to use the current working directory, 
or I think sometimes the script's folder path, as the base URI. libxml (and 
PHP's thin API) expose this value as the baseURI property in the DOM, overriding 
it only as needed when xml:base is in scope.

As pointed out in the comments on Bug #44367, neither PHP nor libxml claim to 
support DOM Core Level 3, so maybe this PHP/libxml "baseURI" property shouldn't 
be expected to behave as it would in a L3-compliant DOM. Nevertheless, I think 
it's reasonable to expect that something called "baseURI" would in fact be 
formatted as a URI, thus making it useful as an actual base URI for resolving 
URI references like href attribute values in XHTML documents and Atom feeds.

So, my request is that the path be prefixed with "file://", at the very least. 
This will have the added benefit of making things forward-compatible with DOM 
Core L3, if such support is intended for the future.

Also, my experience with XML apps suggests that a more typical default would be 
a file URI corresponding to the script itself, rather than the cwd or script 
folder. This would have no effect on resolution of relative references (except 
an empty string), so it seems harmless in that regard. However, I would 
understand if there's a desire to avoid disclosing the script's file name in the 
DOM.

Test script:
---------------
<?php
$xml_string = "<greeting>hello world</greeting>";
$doc = new DOMDocument();
$doc->loadXml($xml_string);
$document_base = $doc->baseURI;
$document_element_base = $doc->documentElement->baseURI;
echo "  DOMDocument->baseURI = " . $document_base;
// A URI, as opposed to a URI reference, is always absolute.
// That is, it has a scheme/protocol (something before ":").
echo (strpos($document_base, ":") == false ? " (not a real URI!)" : " (probably OK!)") . "\n";
echo "  DocumentElement->baseURI = " . $document_element_base;
echo (strpos($document_element_base, ":") == false ? " (not a real URI!)" : " (probably OK!)") . "\n";
?>

Expected result:
----------------
  DOMDocument->baseURI = file:///usr/home/x/ (probably OK!)
  DocumentElement->baseURI = file:///usr/home/x/ (probably OK!)

Actual result:
--------------
  DOMDocument->baseURI = /usr/home/x/ (not a real URI!)
  DocumentElement->baseURI = /usr/home/x/ (not a real URI!)

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2013-07-31 05:42 UTC] mike at skew dot org
-Summary: baseURI should always be a real URI +Summary: In doc not loaded from a URL, baseURI should still be a real URI
 [2013-07-31 05:42 UTC] mike at skew dot org
Made summary (request title) more descriptive.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 15:01:30 2024 UTC