php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #55459 Unable to differentiate between end of file or read error
Submitted: 2011-08-19 09:56 UTC Modified: 2011-08-19 10:46 UTC
From: scope at planetavent dot de Assigned:
Status: Wont fix Package: XML Reader
PHP Version: 5.3.7 OS: Ubuntu 10.04 LTS
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: scope at planetavent dot de
New email:
PHP Version: OS:

 

 [2011-08-19 09:56 UTC] scope at planetavent dot de
Description:
------------
We were forced into using xmlreader the other day due to xml file sizes.

Usually we use DOM to check for well-formedness and schema validity.

It seems, that it is currently not possible to differentiate between a reading error (e.g. not well formed) and "end of file" using xmlreader.

Unfortunately it is not possible to call XMLReader::isValid() after an unsuccessful read when using a schema. Of course, if the document is not well-formed it can't be valid. But isValid() just calls onto xmlTextReaderIsValid() which seems to work only, if the node pointer was able to advance correctly (which is not the case for that kind of error). So this is not an option.

From ext/xmlreader/php_xmlreader.c:806
retval = xmlTextReaderRead(intern->ptr);
if (retval == -1) {
    php_error_docref(NULL TSRMLS_CC, E_WARNING, "An Error Occured while reading");
    RETURN_FALSE;
} else {
    RETURN_BOOL(retval);
}

and

From ext/xmlreader/php_xmlreader.c:849
if (retval == -1) {
    php_error_docref(NULL TSRMLS_CC, E_WARNING, "An Error Occured while reading");
    RETURN_FALSE;
} else {
    RETURN_BOOL(retval);
}

According to libxml, the result of xmlTextReaderRead() is
"1 if the node was read successfully, 0 if there is no more nodes to read, or -1 in case of error".

Therefor PHP should return true, int(0) or false to be able to check for reading errors.

I'm not quite sure if this can be considered a bug. To my mind it is, because it prevents me from using xmlreader in a proper way and as a result, renders it very difficult to work on huge xml files.

libxml is able to distinguish between both conditions, so should php.

Test script:
---------------
<?php

libxml_use_internal_errors( true );

$file = "input.xml";

$xr = new XMLReader();
$xr->open( $file );

while ( true )
{
    $success = $xr->read();
    
    if ( !$success )
    {
        # end of file or error?
        var_dump( $success );
        echo "no success\n";
        break;
    }
    else
    {
        echo "read success\n";
    }
}


Expected result:
----------------
Ability to check for int(0) = no elements to read and false = error during read.

Actual result:
--------------
[...]
read success
read success
read success

Warning: XMLReader::read(): An Error Occured while reading in X:\xmlreader.php on line 12
bool(false)
no success

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2011-08-19 10:46 UTC] bjori@php.net
-Status: Open +Status: Wont fix
 [2011-08-19 10:46 UTC] bjori@php.net
If an error occurred libxml_get_errors() will be populated with details on the 
error.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Nov 03 19:01:28 2024 UTC