php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #49247 preg_match_all and \w issues
Submitted: 2009-08-13 23:16 UTC Modified: 2009-08-15 01:22 UTC
From: mariusads at helpedia dot com Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 5.2.10 OS: Windows 2003 Web Edition
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: mariusads at helpedia dot com
New email:
PHP Version: OS:

 

 [2009-08-13 23:16 UTC] mariusads at helpedia dot com
Description:
------------
I was under the impression that \w matches any word character, as in a-z, A-Z, 0-9 and _ characters (documentation also says "some character codes greater than 128 are used for accented letters, and these are matched by \w" if a specific code page is used but it's not the case here, it's a default PHP installation).

However, I have noticed the following issue and I'm not sure if it's a bug in PHP or an error in the way I've written the regular expression in the code below. As I understood regular expressions work, the code below should not return any match but it does.



Reproduce code:
---------------
$text = 'start://abcd'.chr(255).chr(255).'efgh';

$results = preg_match_all('/(start:\/\/|finish:\/\/){1}([\w\x3A - \x40]*)$/is',$text,$matches,PREG_OFFSET_CAPTURE);
var_dump($results,$matches);

Expected result:
----------------
The could above should return no matches. If the $ at the end is omitted, I think it should return "start://abcd". Replacing \w with 0-9a-zA-Z_ produces the correct output.

Actual result:
--------------
Output of the above code:

int(1)
array(3) {
  [0]=>
  array(1) {
    [0]=>
    array(2) {
      [0]=>
      string(18) "start://abcd��efgh"
      [1]=>
      int(0)
    }
  }
  [1]=>
  array(1) {
    [0]=>
    array(2) {
      [0]=>
      string(8) "start://"
      [1]=>
      int(0)
    }
  }
  [2]=>
  array(1) {
    [0]=>
    array(2) {
      [0]=>
      string(10) "abcd��efgh"
      [1]=>
      int(8)
    }
  }
}


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-08-15 01:22 UTC] felipe@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

See bug #49036
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Jul 09 23:01:33 2025 UTC