php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #49247 preg_match_all and \w issues
Submitted: 2009-08-13 23:16 UTC Modified: 2009-08-15 01:22 UTC
From: mariusads at helpedia dot com Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 5.2.10 OS: Windows 2003 Web Edition
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: mariusads at helpedia dot com
New email:
PHP Version: OS:

 

 [2009-08-13 23:16 UTC] mariusads at helpedia dot com
Description:
------------
I was under the impression that \w matches any word character, as in a-z, A-Z, 0-9 and _ characters (documentation also says "some character codes greater than 128 are used for accented letters, and these are matched by \w" if a specific code page is used but it's not the case here, it's a default PHP installation).

However, I have noticed the following issue and I'm not sure if it's a bug in PHP or an error in the way I've written the regular expression in the code below. As I understood regular expressions work, the code below should not return any match but it does.



Reproduce code:
---------------
$text = 'start://abcd'.chr(255).chr(255).'efgh';

$results = preg_match_all('/(start:\/\/|finish:\/\/){1}([\w\x3A - \x40]*)$/is',$text,$matches,PREG_OFFSET_CAPTURE);
var_dump($results,$matches);

Expected result:
----------------
The could above should return no matches. If the $ at the end is omitted, I think it should return "start://abcd". Replacing \w with 0-9a-zA-Z_ produces the correct output.

Actual result:
--------------
Output of the above code:

int(1)
array(3) {
  [0]=>
  array(1) {
    [0]=>
    array(2) {
      [0]=>
      string(18) "start://abcd��efgh"
      [1]=>
      int(0)
    }
  }
  [1]=>
  array(1) {
    [0]=>
    array(2) {
      [0]=>
      string(8) "start://"
      [1]=>
      int(0)
    }
  }
  [2]=>
  array(1) {
    [0]=>
    array(2) {
      [0]=>
      string(10) "abcd��efgh"
      [1]=>
      int(8)
    }
  }
}


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-08-15 01:22 UTC] felipe@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

See bug #49036
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Mon May 06 18:01:35 2024 UTC