php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #50309 Add plain-text sanitizing to filter
Submitted: 2009-11-27 09:52 UTC Modified: 2021-06-10 11:58 UTC
Votes:1
Avg. Score:1.0 ± 0.0
Reproduced:0 of 1 (0.0%)
From: marcus at synchromedia dot co dot uk Assigned: cmb (profile)
Status: Wont fix Package: Filter related
PHP Version: 5.3.1 OS:
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: marcus at synchromedia dot co dot uk
New email:
PHP Version: OS:

 

 [2009-11-27 09:52 UTC] marcus at synchromedia dot co dot uk
Description:
------------
While the FILTER_FLAG_STRIP_LOW and FILTER_FLAG_STRIP_HIGH flags used 
with FILTER_SANITIZE_STRING work fine, They are just a bit blunt to be 
useful in practice. A more normal application would be to remove any 
'funny' characters from plain text, so for example to allow chars 9, 10, 
13, 32-126. This could be called FILTER_FLAG_PLAIN_TEXT. While this 
could be done using regex, it's probably the most common use case, so 
deserves a dedicated filter flag. This could be flipped around and be 
called FILTER_FLAG_ALLOW_WHITESPACE, overriding STRIP_LOW to allow chars 
9, 10, 13 through.

Similarly, a very common need is to normalize line breaks to \n, \r\n or 
\r.


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2010-12-01 16:31 UTC] jani@php.net
-Package: Feature/Change Request +Package: Filter related
 [2021-06-10 11:58 UTC] cmb@php.net
-Status: Open +Status: Wont fix -Assigned To: +Assigned To: cmb
 [2021-06-10 11:58 UTC] cmb@php.net
In my opinion, calling FILTER_FLAG_STRIP_HIGH a bit blunt is a
gross understatement; it is actually unusable for UTF-8[1] (okay,
fortunately that encoding is rarely used for the Web ;)

And that raises the question what should be regarded as
whitespace; Unicode has a lot more whitespace characters than
ASCII.  It's probably best to use regexps which can be tuned for
the application at hand, or a userland library.

Anyhow, given that there has been no feedback on this ticket for
more than ten years, I'm closing as WONTFIX.

If you, or anybody else is still interested in this, please pursue
the RFC process[2].

[1] <https://3v4l.org/VhvYS>
[2] <https://wiki.php.net/rfc/howto>
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 14:01:32 2024 UTC