php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #52788 preg_match not matching last backslash '\'
Submitted: 2010-09-07 07:37 UTC Modified: 2010-09-07 09:48 UTC
From: reevsey at gmail dot com Assigned:
Status: Not a bug Package: *Regular Expressions
PHP Version: 5.3.3 OS: ALL
Private report: No CVE-ID: None
 [2010-09-07 07:37 UTC] reevsey at gmail dot com
Description:
------------
preg_match() does not match the last backslash at the end of a subject string.

I'm using the pattern "#^[^0-9a-zA-Z]*$#" to filter out any characters from user input other than alphanumerics.

The Problem I'm seeing is if there is a backslash \ as the last character with valid characters before it, then preg_match() does not match it.

Example input:
\    = matched.
\\   = matched.
a\a  = matched.
&@$  = matched.
a\   = not matched.
0\   = not matched.
A\   = not matched.
abc\ = not matched.
etc...

Test script:
---------------
// Obviously taking into account here that backslash \ is an escape char.
// SO double the backslash.

$userInput = "a\\";

if (preg_match("#^[^0-9a-zA-Z]*$#", $userInput))
{
    echo "Invalid input";
}
else
{
    echo "Good to go!";
}

// If you negate the regular expression to "#^[0-9a-zA-Z]*$#" and change the
// test statement to be: 
// if (!preg_match("#^[0-9a-zA-Z]*$#", $userInput)) {}
// it works as expected.

Expected result:
----------------
Expected result: Invalid input

Actual result:
--------------
Actual result: Good to go!

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2010-09-07 08:03 UTC] rasmus@php.net
-Status: Open +Status: Bogus
 [2010-09-07 08:03 UTC] rasmus@php.net
And if you replace that backslash with any other character that is not in your 
range, you will see the same thing.  It has nothing to do with the backslash.  
Your regex is simply completely wrong.

Your regex says:

From the start of the string ^
Find me 0 or more characters that are not [^0-9a-zA-Z]*
Until the end of the string $

Your problem is that you have anchored it to the start of the string.  In each 
of your "not matched" cases, you have a valid char as the first char.

What you probably meant to write was:

'#[^0-9a-zA-Z]+#'

By not anchoring your regex to the start or end of the string, you are telling 
it to look for invalid characters anywhere in the string.  And you want + 
instead of * because you want to match at least 1 invalid char, not 0.

By the way, there are much quicker ways of doing this if your list of valid or 
invalid chars is finite.  See strpbrk and strcspn.
 [2010-09-07 09:48 UTC] reevsey at gmail dot com
Thanks for your speedy reply. Yes your absolutely right and sorry for my mistake.

The code fragment I posted is from a larger code base and for whatever reason backslash was the only character that was not being matched. Other invalid characters '#$%^&' etc. even with valid chars beforehand were being matched, which made it seem like a bug.

Anyway replacing the expression makes it work. 

Once again thanks for your time and the suggested use of the other functions, the expression posted will form a larger expression evently and indeed will need the anchoring back; my mistake.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 14 11:01:27 2024 UTC