php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #77662 htmlspecialchars does not behave as documented regarding ambiguous $flags
Submitted: 2019-02-24 16:07 UTC Modified: 2019-02-25 11:25 UTC
From: nicolas dot roeser at uni-ulm dot de Assigned: cmb (profile)
Status: Closed Package: Strings related
PHP Version: 7.3.2 OS: Linux, macOS
Private report: No CVE-ID: None
 [2019-02-24 16:07 UTC] nicolas dot roeser at uni-ulm dot de
Description:
------------
---
From manual page: https://php.net/function.htmlspecialchars
---

I have found function htmlspecialchars to behave differently than its documentation says. I do not know what is the intended behavior. Either the function or the documentation must be fixed.

There are two main issues:

1) When neither ENT_COMPAT nor ENT_QUOTES nor ENT_NOQUOTES is set, the function is documented to default to ENT_COMPAT. It seems that it defaults to ENT_NOQUOTES instead.

2) When ENT_HTML401 and ENT_XML1 are set, the function is documented to give precedence to ENT_HTML401. It seems that it gives precedence to ENT_XML1 instead.

I have created suitable and readable test scripts and would like to add them to this bug report, but as the test script input is limited to 20 lines, I had to convert my shorter script to a more unreadable version and add it.

This bug report is not really related to #61498: this bug report here is about documented vs. actual behavior.

Test script:
---------------
<?php  // The 20-line limit is crap.
// Super short test script which only tests the default flags and failing
// combinations (and only one per quot/apos group).
function hs_expect($test_num, $expected, $str, $flags=null) {
    if ($flags === null) { $result = htmlspecialchars($str); }
    else { $result = htmlspecialchars($str, $flags, 'UTF-8', TRUE); }
    echo "Test number $test_num: ";
    if ($expected === $result) { echo 'OK'; }
    else { echo "FAIL: expected: >$expected<, result: >$result<"; }
    echo "\n";
}
// ===== tests for quoting behavior =====
$s = '"';
hs_expect(-1, '&quot;', $s); hs_expect( 0, '&quot;', $s, 0);
hs_expect( 8, '&quot;', $s, ENT_HTML401); hs_expect(16, '&quot;', $s, ENT_XML1);
hs_expect(32, '&quot;', $s, ENT_XHTML); hs_expect(64, '&quot;', $s, ENT_HTML5);
// ===== tests for markup language selection =====
$s = "'";
hs_expect(-1, "'", $s);
hs_expect(26, "&#039;", $s, ENT_QUOTES | ENT_HTML401 | ENT_XML1);

Expected result:
----------------
I expect documentation and actual behaviour to be in sync. I do not care which one is fixed (or both).

Actual result:
--------------
See description above.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2019-02-25 00:37 UTC] requinix@php.net
-Summary: htmlspecialchars does not behave as documented regarding its $flags precedence +Summary: htmlspecialchars does not behave as documented regarding ambiguous $flags -Status: Open +Status: Verified -Type: Bug +Type: Documentation Problem -Package: Unknown/Other Function +Package: Strings related
 [2019-02-25 00:37 UTC] requinix@php.net
ENT_NOQUOTES=0 and ENT_HTML401=0 so they cannot be detected, thus any mention of behavior when those are missing cannot be correct. I suspect the note was trying to explain the default if $flags is omitted.

$ php -r 'print_r(get_defined_constants());' | egrep '\bENT_'
    [ENT_COMPAT] => 2
    [ENT_QUOTES] => 3
    [ENT_NOQUOTES] => 0
    [ENT_IGNORE] => 4
    [ENT_SUBSTITUTE] => 8
    [ENT_DISALLOWED] => 128
    [ENT_HTML401] => 0
    [ENT_XML1] => 16
    [ENT_XHTML] => 32
    [ENT_HTML5] => 48

Note:
- All flags except QUOTES are single bits
- NOQUOTES is implicitly the default quoting behavior flag
- HTML401 is implicitly the default doctype flag

When combining ENT_COMPAT/QUOTES/NOQUOTES,
- NOQUOTES(0) | COMPAT(2) = COMPAT(2)
- NOQUOTES(0) | QUOTES(3) = QUOTES(3)
- COMPAT(2) | QUOTES(3) = QUOTES(3)

In effect, QUOTES (double+single) has precedence over COMPAT (double) has precedence over NOQUOTES (neither). Which should make sense.
For the doctype flags, XML1/XHTML/HTML5 (named entity) have precedence over HTML401 (numeric entity).

When $flags is not provided the default is COMPAT | HTML401, and the documentation is correct.

When $flags is provided,

> When neither of ENT_COMPAT, ENT_QUOTES, ENT_NOQUOTES is present, the default is ENT_COMPAT.
Incorrect. Default is ENT_NOQUOTES.

> When more than one of ENT_COMPAT, ENT_QUOTES, ENT_NOQUOTES is present, ENT_QUOTES takes the highest precedence,
> followed by ENT_COMPAT.
Correct.

> When neither of ENT_HTML401, ENT_HTML5, ENT_XHTML, ENT_XML1 is present, the default is ENT_HTML401.
Correct.

> When more than one of ENT_HTML401, ENT_HTML5, ENT_XHTML, ENT_XML1 is present, ENT_HTML5 takes the highest
> precedence, followed by ENT_XHTML, ENT_HTML401.
Incorrect. ENT_XML1 is not included in this list, suggesting it has lowest precedence and HTML401 is higher, however HTML401 is actually the lowest precedence. This may be just a simple accidental omission as XHTML/XML1 are very closely related.

Note that as far as htmlspecialchars is concerned, HTML5/XHTML/XML1 are equivalent as they all encode apostrophes the same way.
 [2019-02-25 11:24 UTC] cmb@php.net
Automatic comment from SVN on behalf of cmb
Revision: http://svn.php.net/viewvc/?view=revision&amp;revision=346899
Log: Fix #77662: htmlspecialchars does not behave as documented regarding ambiguous $flags
 [2019-02-25 11:25 UTC] cmb@php.net
-Status: Verified +Status: Closed -Assigned To: +Assigned To: cmb
 [2019-02-25 11:25 UTC] cmb@php.net
This bug has been fixed in the documentation's XML sources. Since the
online and downloadable versions of the documentation need some time
to get updated, we would like to ask you to be a bit patient.

Thank you for the report, and for helping us make our documentation better.
 [2019-02-25 11:25 UTC] salathe@php.net
Automatic comment on behalf of cmb
Revision: http://git.php.net/?p=doc/en.git;a=commit;h=497e4dc0796c3cc5083b039dcbccfd8c0a264e2d
Log: Fix #77662: htmlspecialchars does not behave as documented regarding ambiguous $flags
 [2020-02-07 06:05 UTC] phpdocbot@php.net
Automatic comment on behalf of cmb
Revision: http://git.php.net/?p=doc/en.git;a=commit;h=b3e378b59ccf68d3278457ad00f548eaf19b187d
Log: Fix #77662: htmlspecialchars does not behave as documented regarding ambiguous $flags
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 15:01:29 2024 UTC