php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #76341 Tidy class doc: missing errors and special returned values
Submitted: 2018-05-15 10:56 UTC Modified: 2019-01-14 13:37 UTC
Votes:1
Avg. Score:3.0 ± 0.0
Reproduced:0 of 0 (0.0%)
From: salsi at icosaedro dot it Assigned:
Status: Open Package: Tidy (PECL)
PHP Version: Irrelevant OS:
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: salsi at icosaedro dot it
New email:
PHP Version: OS:

 

 [2018-05-15 10:56 UTC] salsi at icosaedro dot it
Description:
------------
To integrate the documentation about the tidy class, here is what I found so far by testing with actual code; please consider for a possible merge with the current documentation as there are several detailed annotations:

class tidy {

	/**
	 * Informational, warning and error messages collected while parsing.
	 * Lines are separated by '\r\n'; a trailing '\r' is also added for some
	 * reason. The NULL value means either no parsing has still being performed
	 * or the parsing succeeded.
	 * @var string 
	 */
	public $errorBuffer;

	/**
	 * Initializes a new parser and possibly also parse the specified file.
	 * Hint: to parse a string, do not specify any argument and then use the
	 * parseString() method instead.
	 * @param string $filename File to parse. If this argument is provided, it
	 * must be a valid existing file name.
	 * @param mixed $config String of the configuration options or associative
	 * array of the options. Set the empty array here if you want to set the
	 * following arguments too.
	 * @param string $encoding Encoding for input and output documents.
	 * Note that the tidy library still can't parse a possible HTML META element
	 * specifying the encoding, so setting this argument is advised.
	 * Default input encoding: unspecified, possibly unexpected.
	 * Default output encoding: UTF8.
	 * BEWARE: "UTF-8" is not a recognized encoding; "UTF8" is the right spelling.
	 * @param boolean $use_include_path If true, a relative $filename path is
	 * resolved first against the directories listed in the include_path php.ini
	 * directive, and then search in the current directory. Default: FALSE.
	 * @return void
	 * @triggers E_WARNING Failed reading $filename. Failed reading the $config
	 * file when this argument is a string. Invalid $encoding.
	 * @triggers E_NOTICE Unknown or invalid $config when this argument is an
	 * associative array.
	 */
	function __construct($filename = NULL, $config = NULL, $encoding = NULL,
			$use_include_path = FALSE){}

	/**
	 * Returns the node starting with the BODY element.
	 * @return tidyNode The BODY element, either parsed or added by this library.
	 * Returns NULL if still no parsing performed.
	 */
	function body(){}
	

	// Other methods: cleanRepair(), diagnose(), getConfig(), getHtmlVer()

	/**
	 * Returns the value of the specified option.
	 * @param string $option Name of the option to retrieve.
	 * @return mixed Value of the option.
	 * @triggers E_WARNING No this option.
	 */
	function getOpt(/*. string .*/ $option){}
	
	/**
	 * Returns the documentation for the given option name.
	 * @param string $optname
	 * @return string
	 * @deprecated This method is not available; tested with getRelease() gives 2017/03/01
	 * on Wamp Server with PHP 7.1.9.
	 */
	function getOptDoc($optname){}
	
	/**
	 * Return the release date of the tidy library.
	 * @return string
	 */
	function getRelease(){}
	
	/**
	 * Returns the status of the document.
	 * @return int
	 */
	function getStatus(){}
	
	/**
	 * Returns the HEAD element.
	 * @return tidyNode The HEAD element, or NULL if not available.
	 */
	function head(){}
	
	/**
	 * Returns the HTML element.
	 * @return tidyNode The HTML element, or NULL if not available.
	 */
	function html(){}
	
	/**
	 * Tells if the document is XHTML.
	 * @return boolean
	 */
	function isXhtml(){}
	
	/**
	 * Tells if the document is XML.
	 * @return boolean
	 */
	function isXml(){}
	
	/**
	 * Parse the given file.
	 * @param string $filename File to parse.
	 * @param mixed $config String of the configuration options or associative
	 * array of the options. Set the empty array here if you want to set the
	 * following arguments too.
	 * @param string $encoding Encoding for input and output documents.
	 * Note that the tidy library still can't parse a possible HTML META element
	 * specifying the encoding, so setting this argument is advised.
	 * Default input encoding: unspecified, possibly unexpected.
	 * Default output encoding: UTF-8.
	 * BEWARE: "UTF-8" is not a recognized encoding; "UTF8" is the right spelling.
	 * @param boolean $use_include_path If true, a relative $filename path is
	 * resolved first against the directories listed in the include_path php.ini
	 * directive, and then search in the current directory. Default: FALSE.
	 * @return boolean Always returns TRUE.
	 * @triggers E_WARNING Failed reading $filename. Failed reading the $config
	 * file when this argument is a string. Invalid $encoding.
	 * @triggers E_NOTICE Unknown or invalid $config when this argument is an
	 * associative array.
	 */
	function parseFile($filename, $config = NULL, $encoding = NULL, $use_include_path = FALSE){}
	
	/**
	 * Parse the given string.
	 * @param string $input Text to parse.
	 * @param mixed $config String of the configuration options or associative
	 * array of the options. Set the empty array here if you want to set the
	 * following arguments too.
	 * @param string $encoding Encoding for input and output documents.
	 * Note that the tidy library still can't parse a possible HTML META element
	 * specifying the encoding, so setting this argument is advised.
	 * Default input encoding: unspecified, possibly unexpected.
	 * Default output encoding: UTF-8.
	 * BEWARE: "UTF-8" is not a recognized encoding; "UTF8" is the right spelling.
	 * @return boolean Always returns TRUE.
	 * @triggers E_WARNING Failed reading the $config file when this argument is
	 * a string. Invalid $encoding.
	 * @triggers E_NOTICE Unknown or invalid $config when this argument is an
	 * associative array.
	 */
	function parseString($input, $config = NULL, $encoding = NULL){}
	
	/**
	 * Parse the given file and returns the repaired result.
	 * @param string $filename File to parse. If this argument is provided, it
	 * must be a valid existing file name.
	 * @param mixed $config String of the configuration options or associative
	 * array of the options. Set the empty array here if you want to set the
	 * following arguments too.
	 * @param string $encoding Encoding for input and output documents.
	 * Note that the tidy library still can't parse a possible HTML META element
	 * specifying the encoding, so setting this argument is advised.
	 * Default input encoding: unspecified, possibly unexpected.
	 * Default output encoding: UTF-8.
	 * BEWARE: "UTF-8" is not a recognized encoding; "UTF8" is the right spelling.
	 * @param boolean $use_include_path If true, a relative $filename path is
	 * resolved first against the directories listed in the include_path php.ini
	 * directive, and then search in the current directory. Default: FALSE.
	 * @return string Repaired contents of the file.
	 * @triggers E_WARNING Failed reading $filename. Failed reading the $config
	 * file when this argument is a string. Invalid $encoding.
	 * @triggers E_NOTICE Unknown or invalid $config when this argument is an
	 * associative array.
	 */
	function repairFile($filename, $config = NULL, $encoding = NULL, $use_include_path = FALSE){}
	
	/**
	 * Parse the given string and returns the repaired result.
	 * @param string $input Text to parse.
	 * @param mixed $config String of the configuration options or associative
	 * array of the options. Set the empty array here if you want to set the
	 * following arguments too.
	 * @param string $encoding Encoding for input and output documents.
	 * Note that the tidy library still can't parse a possible HTML META element
	 * specifying the encoding, so setting this argument is advised.
	 * Default input encoding: unspecified, possibly unexpected.
	 * Default output encoding: UTF-8.
	 * BEWARE: "UTF-8" is not a recognized encoding; "UTF8" is the right spelling.
	 * @return string Repaired contents of the string.
	 * @triggers E_WARNING Failed reading the $config file when this argument is
	 * a string. Invalid $encoding.
	 * @triggers E_NOTICE Unknown or invalid $config when this argument is an
	 * associative array.
	 */
	function repairString($input, $config = NULL, $encoding = NULL){}
	
	/**
	 * Returns the root node of the parsed tree.
	 * @return tidyNode Root note of the parsed tree, possibly a tidyNode with
	 * empty name property if nothing has been parsed yet.
	 */
	function root(){}
}



Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-05-15 11:04 UTC] salsi at icosaedro dot it
CORRIGE: In the pseudo-PHP code I wrote above, all the default values of the optional arguments are dummy values, mostly NULL, as there is no other way to mark optional arguments rather than assigning some default value; these values should then be ignored.
 [2018-05-20 15:33 UTC] salsi at icosaedro dot it
Fix: I missed the following method does exist but it is not implemented:

tidy::isXhtml():
@deprecated This method is not implemented yet; it always returns false.


Most of the methods do not allow the program to detect errors from the returned value, or have a quite articulated behavior; the only safe way to detect these errors is to map errors (including E_NOTICE) into exceptions. Anyway, here are my findings that fix and integrate my initial post:

tidy::parseFile():
Fix: @return boolean FALSE on E_WARNING, TRUE otherwise.

tidy::parseString():
Fix: @return boolean FALSE on E_WARNING for invalid encoding, TRUE in any other case (on success, on E_WARNING or E_NOTICE for any other reason).

tidy::repairFile():
Fix: @return mixed A string containing the repaired contents of the file. Returns FALSE only if it fails accessing the file, but no error is signaled; the other errors cannot be detected from the returned value.

tidy::repairString():
Fix: @return string Repaired contents of the string. Errors cannot be detected  by the returned value.

Last post on this topic, I promise!
 [2019-01-02 00:49 UTC] girgias@php.net
-Package: Documentation problem +Package: Tidy
 [2019-01-02 00:49 UTC] girgias@php.net
If this is a code contribution please do a pull request on the official git repository.
 [2019-01-14 13:37 UTC] salsi at icosaedro dot it
-Summary: Tidy class doc: a contribute +Summary: Tidy class doc: missing errors and special returned values
 [2019-01-14 13:37 UTC] salsi at icosaedro dot it
It's not a code contribution, it's my attempt to fill the gap between the quite articulated feedback and error reporting mechanism of the tidy library and the the current way this behavior is explained in the PHP manual. In particular, the manual does not explain that many functions:

- may return special values;

- may raise E_WARNING errors that can be detected by checking the returned value;

- may raise E_WARNING and E_NOTICE errors *without* returning specific values (but often invalid unexpected results, for example improperly encoded text) and in these cases the $errorBuffer property should be checked instead.

For these reasons I submitted my findings in the hope the manual could be improved by listing all the possible returned values and all the possible triggered errors and the specific returned value for each one of these.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 17:01:32 2024 UTC