|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
[2009-08-15 01:22 UTC] felipe@php.net
|
|||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Sun Nov 02 06:00:01 2025 UTC |
Description: ------------ I was under the impression that \w matches any word character, as in a-z, A-Z, 0-9 and _ characters (documentation also says "some character codes greater than 128 are used for accented letters, and these are matched by \w" if a specific code page is used but it's not the case here, it's a default PHP installation). However, I have noticed the following issue and I'm not sure if it's a bug in PHP or an error in the way I've written the regular expression in the code below. As I understood regular expressions work, the code below should not return any match but it does. Reproduce code: --------------- $text = 'start://abcd'.chr(255).chr(255).'efgh'; $results = preg_match_all('/(start:\/\/|finish:\/\/){1}([\w\x3A - \x40]*)$/is',$text,$matches,PREG_OFFSET_CAPTURE); var_dump($results,$matches); Expected result: ---------------- The could above should return no matches. If the $ at the end is omitted, I think it should return "start://abcd". Replacing \w with 0-9a-zA-Z_ produces the correct output. Actual result: -------------- Output of the above code: int(1) array(3) { [0]=> array(1) { [0]=> array(2) { [0]=> string(18) "start://abcd��efgh" [1]=> int(0) } } [1]=> array(1) { [0]=> array(2) { [0]=> string(8) "start://" [1]=> int(0) } } [2]=> array(1) { [0]=> array(2) { [0]=> string(10) "abcd��efgh" [1]=> int(8) } } }