php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #69203 FILTER_FLAG_STRIP_HIGH doesn't strip ASCII 127
Submitted: 2015-03-09 11:12 UTC Modified: 2015-04-21 11:11 UTC
From: whatthejeff@php.net Assigned: whatthejeff (profile)
Status: Closed Package: Filter related
PHP Version: 5.5Git-2015-03-09 (Git) OS:
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: whatthejeff@php.net
New email:
PHP Version: OS:

 

 [2015-03-09 11:12 UTC] whatthejeff@php.net
Description:
------------
FILTER_FLAG_STRIP_HIGH doesn't strip ASCII 127. This is inconsistent with FILTER_FLAG_ENCODE_HIGH which encodes ASCII 127 as expected.

Test script:
---------------
// FILTER_FLAG_STRIP_HIGH
var_dump(filter_var("\x7f", FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_HIGH));
var_dump(filter_var("\x7f", FILTER_UNSAFE_RAW, FILTER_FLAG_STRIP_HIGH));
var_dump(filter_var("\x7f", FILTER_SANITIZE_ENCODED, FILTER_FLAG_STRIP_HIGH));
var_dump(filter_var("\x7f", FILTER_SANITIZE_SPECIAL_CHARS, FILTER_FLAG_STRIP_HIGH));

// FILTER_FLAG_ENCODE_HIGH
var_dump(filter_var("\x7f", FILTER_SANITIZE_STRING, FILTER_FLAG_ENCODE_HIGH));
var_dump(filter_var("\x7f", FILTER_UNSAFE_RAW, FILTER_FLAG_ENCODE_HIGH));
var_dump(filter_var("\x7f", FILTER_SANITIZE_ENCODED, FILTER_FLAG_ENCODE_HIGH));
var_dump(filter_var("\x7f", FILTER_SANITIZE_SPECIAL_CHARS, FILTER_FLAG_ENCODE_HIGH));

Expected result:
----------------
string(0) ""
string(0) ""
string(0) ""
string(0) ""
string(6) ""
string(6) ""
string(3) "%7F"
string(6) ""

Actual result:
--------------
string(1) ""
string(1) ""
string(3) "%7F"
string(1) ""
string(6) ""
string(6) ""
string(3) "%7F"
string(6) ""

Patches

Pull Requests

Pull requests:

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2015-03-14 08:20 UTC] nikic@php.net
Automatic comment on behalf of whatthejeff@gmail.com
Revision: http://git.php.net/?p=php-src.git;a=commit;h=1e51c5411e2d1bdcddb6731a502ce429605c1d08
Log: Fix #69203: FILTER_FLAG_STRIP_HIGH doesn't strip ASCII 127
 [2015-03-14 08:20 UTC] nikic@php.net
-Status: Open +Status: Closed
 [2015-04-20 11:26 UTC] dominic at mailinator dot com
Doesn't FILTER_FLAG_ENCODE_HIGH contain the error here? 127 is lower ASCII.
 [2015-04-20 11:52 UTC] whatthejeff@php.net
-Assigned To: +Assigned To: whatthejeff
 [2015-04-20 12:03 UTC] whatthejeff@php.net
ASCII only encodes 128 characters. I think LOW/HIGH was meant to reference the non-printable characters.
 [2015-04-20 12:25 UTC] dominic at mailinator dot com
That's a little confusing. Lots of people mean Extended ASCII when they refer to ASCII. Extended ASCII encodes 256 characters, with the additional 128 characters known as 'High ASCII'.

Given that the PHP documentation - http://php.net/manual/en/filter.filters.flags.php - says that these filters encode/strip characters higher than 127 (which is now no longer true?), I suspect the filter creators and the documenters were referring to High ASCII.
 [2015-04-20 14:04 UTC] dominic at mailinator dot com
It looks like FILTER_FLAG_ENCODE_HIGH's behaviour changed in commit: http://git.php.net/?p=php-src.git;a=commit;h=7d7248390cb85f61150304bbdd3eace0a2023a86

with message:

Filter fixes:
Fixed possible double encoding problem with sanitizing filters
Make use of space-strict strip_tags() function

So it seems like an unintentional side-effect. Until then, both STRIP and ENCODE were for >127.
 [2015-04-21 08:23 UTC] whatthejeff@php.net
> So it seems like an unintentional side-effect. Until then, both STRIP and ENCODE were for >127.

I think there was only one PHP release that included this extension before that commit. Hard to say if it was intentional or not, but I guess we could ask Derick or Ilia if they remember.

My assumption is that these flags were intended to optionally strip/encode characters that don't fall within the range of printable ASCII characters (20-7e). It's true that the original code didn't strip/encode the DEL control character (7f), but I'm not sure if that was intentional or an oversight.
 [2015-04-21 08:56 UTC] derick@php.net
IIRC, they should strip / encode for for >= 128 only.
 [2015-04-21 10:44 UTC] derick@php.net
BTW, that's what the docs say too: http://docs.php.net/manual/en/filter.filters.flags.php
 [2015-04-21 11:11 UTC] whatthejeff@php.net
I think the intended behavior was a little ambiguous since the docs for FILTER_FLAG_ENCODE_HIGH have been out of sync with the implementation/tests since 5.2.1 (it seems).
 [2015-04-21 11:16 UTC] whatthejeff@php.net
I guess we should revert this and fix FILTER_FLAG_ENCODE_HIGH if they're not behaving as intended.
 [2015-04-21 11:42 UTC] dominic at mailinator dot com
That would get my vote, possibly with new filters FILTER_FLAG_[ENCODE|STRIP]_CONTROL_CODES.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 12:01:29 2024 UTC