Bug #70298 filter_var strips '/' characters, counter to docs
Submitted: 2015-08-19 02:35 UTC Modified: 2015-08-30 13:49 UTC
From: rfredlund13 at gmail dot com Assigned: cmb (profile)
Status: Closed Package: Filter related
PHP Version: 5.6.12 OS: Irrelevant
Private report: No CVE-ID: None
 [2015-08-19 02:35 UTC] rfredlund13 at gmail dot com
I was writing unit tests (using PHPUnit) to validate my email validation functionality, which simply returned the result of filter_var($var, FILTER_SANITIZE_EMAIL). I tested all of the characters allowed in the documentation, but the / character was stripped out. I expected / to be a valid character. This is what the documentation states:

FILTER_SANITIZE_EMAIL | Remove all characters except letters, digits and !#$%&'*+-/=?^_`{|}~@.[].

Found at

I then verified that this problem exists in the latest windows version as well (ran from command line with -r flag).

This is either a problem with the implementation of the function, or inaccurate documentation. I briefly tried to interpret RFC 5321/RFC5322, but couldn't find this particular case.

Test script:
echo "Output: " . filter_var('!#$%&\'*+-=?^_`{|}~@.[]', FILTER_SANITIZE_EMAIL);
echo "\nOutput: " . filter_var('!#$%&\'*+-=?^_`{|}~@.[]/', FILTER_SANITIZE_EMAIL);
echo "\nOutput: " . filter_var('/', FILTER_SANITIZE_EMAIL);

Expected result:
Output: !#$%&'*+-=?^_`{|}~@.[]
Output: !#$%&'*+-=?^_`{|}~@.[]/
Output: /

Actual result:
Output: !#$%&'*+-=?^_`{|}~@.[]
Output: !#$%&'*+-=?^_`{|}~@.[]


 [2015-08-19 03:15 UTC]
 [2015-08-19 03:15 UTC]

Both RFCs 822 (which the code mentions) and 5322 (which obsoletes 2822, which obsoletes 822) allow slash.

Code I wrote a while back to generate a regex, for reference:
 [2015-08-19 13:08 UTC]
Indeed, slashes are allowed in quoted-strings. However, the
slashes had been disallowed as resolution for bug #49470. Not
sure, what to do.

Actually, the implementation of FILTER_SANITIZE_EMAIL is way too
simplistic to even vagely conform to one of the relevant RFCs.
Something like Damian's regex-email.php algorithm seems to be more
 [2015-08-19 13:18 UTC] rfredlund13 at gmail dot com
That makes sense. The RFCs are pretty complicated, and cover a lot of edge cases. I do not need to perfectly conform to the RFCs, so I will just remove '/' from my test case. Still, the documentation should be updated to remove '/' from the list of allowed characters.
 [2015-08-19 13:40 UTC]
 [2015-08-19 13:41 UTC]
 [2015-08-19 13:41 UTC]
Fixing the docs is necessary anyway, so I've did that. I'm leaving
the ticket open for now.
 [2015-08-19 13:42 UTC]
For reference: <>.
 [2015-08-30 13:49 UTC]
 [2015-08-30 13:49 UTC]
Okay, no further comments so I'm closing.
