php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #30382 preg_match and utf-8
Submitted: 2004-10-10 15:49 UTC Modified: 2005-02-12 22:00 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: TiloLutz at gmx dot de Assigned: derick (profile)
Status: Not a bug Package: PCRE related
PHP Version: 4.3.9 OS: Suse Linux 9.1
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: TiloLutz at gmx dot de
New email:
PHP Version: OS:

 

 [2004-10-10 15:49 UTC] TiloLutz at gmx dot de
Description:
------------
preg_match doesn't work correct when utf-8 is used

preg_match('/^([[:alpha:]])*$/u', "?")
should return true because [[:alpha:]] contains
also localized special characters like ???.
Unfortunatly it returns false.

It works with iso-8859-15 but doesn't work with utf-8


Reproduce code:
---------------
putenv("LANG=de_DE");
setlocale(LC_ALL, "de_DE");
if (preg_match('/^([[:alpha:]])*$/u', "?") echo "true";

putenv("LANG=de_DE.utf8");
setlocale(LC_ALL, "de_DE.utf8");
if (preg_match('/^([[:alpha:]])*$/u', "?") echo "true";


Expected result:
----------------
true
true



Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2004-10-10 16:00 UTC] aidan@php.net
Reassigning to proper category.
 [2004-10-10 16:20 UTC] tony2001@php.net
"PCRE related" is the right category for this report.

 [2004-10-11 08:01 UTC] derick@php.net
This depends on how the รค is encoded in your script. If it's just iso-8859-1 then it won't work. No bug here unless you can come up with an example that works. (Post a link to a zip file containing your scripts).
 [2004-10-11 11:41 UTC] TiloLutz at gmx dot de
You can find an example at 
http://www.stud.uni-karlsruhe.de/~usjp/preg_match.zip 
 
The file is 100% encoded as utf8.
 [2004-10-11 12:40 UTC] chregu@php.net
I can reproduce it on Debian, but not on Mac OS X
 [2004-10-11 13:00 UTC] derick@php.net
I can reproduce it too, but I need to think real hard about it first before I can say whether it is correct or not :)
 [2004-12-06 22:08 UTC] pmichaud at pobox dot com
It might help to know that PCRE doesn't support the [:alpha:], [:digit:], etc. classes in UTF-8 mode.  From http://www.pcre.org/pcre.txt, under "POSIX CHARACTER CLASSES":

   In UTF-8 mode, characters with values greater than 128 do not match any of the POSIX character classes.

So, the fact that [:alpha:] doesn't work on UTF-8 strings appears to be a limitation of PCRE itself.  (And I do so strongly wish it were otherwise.)

Pm
 [2005-02-12 22:00 UTC] tony2001@php.net
PCRE has limited UTF-8 support.
"A  class  is  matched  against a UTF-8 character instead of just a single byte, but it can match only characters whose values are less than 256. Characters with greater values always fail to match a class." (c) man pcre

No bug in PHP -> bogus.
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Thu Jul 03 19:01:35 2025 UTC