php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #49036 preg_replace error when using \W
Submitted: 2009-07-23 17:34 UTC Modified: 2009-07-24 10:35 UTC
Votes:1
Avg. Score:3.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: gvdefence-ncr at yahoo dot it Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 5.2.10 OS: WXP
Private report: No CVE-ID: None
 [2009-07-23 17:34 UTC] gvdefence-ncr at yahoo dot it
Description:
------------
According to regexp sintax [\W] is identical to [^A-Za-z0-9_] (can see: http://msdn.microsoft.com/en-us/library/1400241x(VS.85).aspx)

But preg_replace does not seem to work the same way.


Reproduce code:
---------------
<?php
   //according to regexp sintax [\W] is identical to [^A-Za-z0-9_] (can see: http://msdn.microsoft.com/en-us/library/1400241x(VS.85).aspx)
   
   $result1 = preg_replace('/[^A-Za-z0-9_]*/', '', "test ????? test");
   $result2 = preg_replace('/[\W]*/', '', "test ????? test");
      
   echo "<pre>" . $result1 . "</pre>"; //ok, it shows: "testtest"
   echo "<pre>" . $result2 . "</pre>" //wrong it shows: "test?????test"
?>

Expected result:
----------------
testtest

testtest

Actual result:
--------------
testtest

test?????test

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-07-23 17:41 UTC] scottmac@php.net
pcre is locale aware so there are some exceptions. What locale are you using?

Also we use PCRE which is not the Microsoft regexp syntax, I suggest you read the PHP manual instead.
 [2009-07-23 17:59 UTC] gvdefence-ncr at yahoo dot it
What's locale?

[\W] is identical to [^A-Za-z0-9_] is not only Microsoft idea.

\W means matching any nonword character is the same of [^\w] which is the same of [^A-Za-z0-9_]
[http://en.wikipedia.org/wiki/Regular_expression#POSIX_character_classes]

Also many other website that talks about reg exp say the same thing of Microsoft and wikipedia, you can search in Google.

Sorry, but this is a bug!


BTW: php manual is completly useless regarding regular expression sintax, it does not help in any way, that's why I added the Microsoft documentation link.
 [2009-07-23 18:14 UTC] derick@php.net
http://uk.php.net/manual/en/regexp.reference.backslash.php clearly explains it:

\w
    any "word" character
\W
    any "non-word" character

Each pair of escape sequences partitions the complete set of characters into two disjoint sets. Any given character matches one, and only one, of each pair.

A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word". The definition of letters and digits is controlled by PCRE's character tables, and may vary if locale-specific matching is taking place. For example, in the "fr" (French) locale, some character codes greater than 128 are used for accented letters, and these are matched by \w. 
 [2009-07-23 18:54 UTC] gvdefence-ncr at yahoo dot it
To me this only means that also the PHP documentation is wrong.

1st) there is a paradox:
if [\w] (I tested same issue of \W) does the matching depending on local setting then also [A-Za-z_] (which is the same of [\w] should behave in the same way and match also accented character like ????? depending on local setting, since it does not happen this last one would be the bug.

2nd) I wonder how to acknowledge all websites on the internet (including Wikipedia) that PHP reg expression sintax is different from the common sense standard of the rest of the world!



PS
I adore PHP, just trying to help. Bye!)
 [2009-07-23 20:05 UTC] scottmac@php.net
We have Perl Compatible Regular Expressions *NOT* POSIX regular expressions.
 [2009-07-24 10:35 UTC] gvdefence-ncr at yahoo dot it
Ok, I read documentation at http://perldoc.perl.org/perlre.html#Regular-Expressions

But this anyway is an issue (almost a bug), on the web environment.
Because my regexp would change bahaviour depending upon the locale settings of the server.
How do I make sure all servers running my web apllication have the same locale settings.

Thanks anyway for the explanation.
I'm gonna t add a comment on preg_replace PHP documentation.
PHP forever!)
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Jan 15 06:01:30 2025 UTC