php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #49036 preg_replace error when using \W
Submitted: 2009-07-23 17:34 UTC Modified: 2009-07-24 10:35 UTC
Votes:1
Avg. Score:3.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: gvdefence-ncr at yahoo dot it Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 5.2.10 OS: WXP
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: gvdefence-ncr at yahoo dot it
New email:
PHP Version: OS:

 

 [2009-07-23 17:34 UTC] gvdefence-ncr at yahoo dot it
Description:
------------
According to regexp sintax [\W] is identical to [^A-Za-z0-9_] (can see: http://msdn.microsoft.com/en-us/library/1400241x(VS.85).aspx)

But preg_replace does not seem to work the same way.


Reproduce code:
---------------
<?php
   //according to regexp sintax [\W] is identical to [^A-Za-z0-9_] (can see: http://msdn.microsoft.com/en-us/library/1400241x(VS.85).aspx)
   
   $result1 = preg_replace('/[^A-Za-z0-9_]*/', '', "test ????? test");
   $result2 = preg_replace('/[\W]*/', '', "test ????? test");
      
   echo "<pre>" . $result1 . "</pre>"; //ok, it shows: "testtest"
   echo "<pre>" . $result2 . "</pre>" //wrong it shows: "test?????test"
?>

Expected result:
----------------
testtest

testtest

Actual result:
--------------
testtest

test?????test

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-07-23 17:41 UTC] scottmac@php.net
pcre is locale aware so there are some exceptions. What locale are you using?

Also we use PCRE which is not the Microsoft regexp syntax, I suggest you read the PHP manual instead.
 [2009-07-23 17:59 UTC] gvdefence-ncr at yahoo dot it
What's locale?

[\W] is identical to [^A-Za-z0-9_] is not only Microsoft idea.

\W means matching any nonword character is the same of [^\w] which is the same of [^A-Za-z0-9_]
[http://en.wikipedia.org/wiki/Regular_expression#POSIX_character_classes]

Also many other website that talks about reg exp say the same thing of Microsoft and wikipedia, you can search in Google.

Sorry, but this is a bug!


BTW: php manual is completly useless regarding regular expression sintax, it does not help in any way, that's why I added the Microsoft documentation link.
 [2009-07-23 18:14 UTC] derick@php.net
http://uk.php.net/manual/en/regexp.reference.backslash.php clearly explains it:

\w
    any "word" character
\W
    any "non-word" character

Each pair of escape sequences partitions the complete set of characters into two disjoint sets. Any given character matches one, and only one, of each pair.

A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word". The definition of letters and digits is controlled by PCRE's character tables, and may vary if locale-specific matching is taking place. For example, in the "fr" (French) locale, some character codes greater than 128 are used for accented letters, and these are matched by \w. 
 [2009-07-23 18:54 UTC] gvdefence-ncr at yahoo dot it
To me this only means that also the PHP documentation is wrong.

1st) there is a paradox:
if [\w] (I tested same issue of \W) does the matching depending on local setting then also [A-Za-z_] (which is the same of [\w] should behave in the same way and match also accented character like ????? depending on local setting, since it does not happen this last one would be the bug.

2nd) I wonder how to acknowledge all websites on the internet (including Wikipedia) that PHP reg expression sintax is different from the common sense standard of the rest of the world!



PS
I adore PHP, just trying to help. Bye!)
 [2009-07-23 20:05 UTC] scottmac@php.net
We have Perl Compatible Regular Expressions *NOT* POSIX regular expressions.
 [2009-07-24 10:35 UTC] gvdefence-ncr at yahoo dot it
Ok, I read documentation at http://perldoc.perl.org/perlre.html#Regular-Expressions

But this anyway is an issue (almost a bug), on the web environment.
Because my regexp would change bahaviour depending upon the locale settings of the server.
How do I make sure all servers running my web apllication have the same locale settings.

Thanks anyway for the explanation.
I'm gonna t add a comment on preg_replace PHP documentation.
PHP forever!)
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Jan 15 10:01:29 2025 UTC