php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #30435 Preg_Match ignoring word boundaries
Submitted: 2004-10-14 16:17 UTC Modified: 2005-02-03 18:56 UTC
From: wberg at doce dot ufl dot edu Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 4.3.9 OS: Linux 2.4.21-alpha-r4
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: wberg at doce dot ufl dot edu
New email:
PHP Version: OS:

 

 [2004-10-14 16:17 UTC] wberg at doce dot ufl dot edu
Description:
------------
I run PHP 4.3.4 locally on Win XP Pro / ISS 5. The code supplied works flawlessly.

On the Linux server, however, which currently runs PHP 4.3.9, it seems to ignore or mess up the \b word boundaries. The same thing happened when the server ran PHP 4.3.3RC1. Upgrading to 4.3.9 didn't help.

You can see the code in action, where it produces the wrong results, on the Linux server at:

http://www.kuruvinda.com/test.php

You can get full PHP info() on this server at:

http://www.kuruvinda.com/c.php


Reproduce code:
---------------
<?php
$Pattern = "\b[a?????]rm[e????]\b";

$Blazon = "D'argent, au lion d'azur, arm? et lampass? de gueules, (le lion est quelquefois charg? d'une fleur-de-lis d'or, ou d'un ?cusson d'or ? l'aigle ?ploy?e de sable).";

print "Pattern: $Pattern<p>Blazon: $Blazon<p>";

# Should return "Found"
if (preg_match("/".$Pattern."/i",$Blazon)) {
    print "Found";
} else {
    print "Not Found";
}

$Blazon = "D'argent, au lion d'azur, arm?e et lampass? de gueules, (le lion est quelquefois charg? d'une fleur-de-lis d'or, ou d'un ?cusson d'or ? l'aigle ?ploy?e de sable).";

print "<p>Pattern: $Pattern<p>Blazon 2: $Blazon<p>";

# Should return "Not Found"
if (preg_match("/".$Pattern."/i",$Blazon)) {
    print "Found";
} else {
    print "Not Found";
}
?>


Expected result:
----------------
The match should find "arm?" as a whole word only in $Blazon and not in $Blazon2.

Actual result:
--------------
The match does not find "arm?" as a whole word in $Blazon, but does find it in $Blazon2, although it isn't a whole word there.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2004-10-15 15:07 UTC] wberg at doce dot ufl dot edu
"The match does not find "arm?" as a whole word in $Blazon, but does find it in $Blazon2, although it isn't a whole word there."

Should be:

The match does not find "arm?" as a whole word in Blazon, but does find it in Blazon 2, although it isn't a whole word there.
 [2004-10-15 15:27 UTC] wberg at doce dot ufl dot edu
Removing the case insensitivity (i) from the pattern match makes no difference in the outcome. See:

http://www.kuruvinda.com/test2.php

Using POSIX ereg instead does work. See:

http://www.kuruvinda.com/test3.php

<?php
$Pattern = "[[:<:]][a?????]rm[e????][[:>:]]";

$Blazon = "D'argent, au lion d'azur, arm? et lampass? de gueules, (le lion est quelquefois charg? d'une fleur-de-lis d'or, ou d'un ?cusson d'or ? l'aigle ?ploy?e de sable).";

print "Pattern: $Pattern<p>Blazon: $Blazon<p>";

# Should print Found
if (eregi($Pattern,$Blazon,$regs)) {
    print "Found";
} else {
    print "Not Found";
}

$Blazon = "D'argent, au lion d'azur, arm?e et lampass? de gueules, (le lion est quelquefois charg? d'une fleur-de-lis d'or, ou d'un ?cusson d'or ? l'aigle ?ploy?e de sable).";

print "<p>Pattern: $Pattern<p>Blazon 2: $Blazon<p>";

# Should print Not Found
if (eregi($Pattern,$Blazon,$regs)) {
    print "Found";
} else {
    print "Not Found";
}
?>
 [2004-10-15 15:29 UTC] wberg at doce dot ufl dot edu
Actually, I was wrong. Using POSIX does *not* work. It won't find either...
 [2005-02-03 15:38 UTC] wberg at doce dot ufl dot edu
I can't. The server is not mine and the administrator only upgrades major releases that are verified stable.

However, in the meantime I got it to run properly on the Linux server by adding the following line to the top of my code:

    setlocale(LC_CTYPE, 'fr_FR');

Since the accents involved are all contained in the French language, this did the job.

The question remains, of course, why the Windows version doesn't need this.
 [2005-02-03 18:56 UTC] sniper@php.net
Who knows, but not really a bug then.

 
PHP Copyright © 2001-2022 The PHP Group
All rights reserved.
Last updated: Wed Jan 26 09:03:33 2022 UTC