|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #39744 UTF8 support
Submitted: 2006-12-05 15:48 UTC Modified: 2006-12-05 17:33 UTC
From: sdamir at gmail dot com Assigned:
Status: Not a bug Package: *Regular Expressions
PHP Version: 5.2.0 OS: Linux 2.6.18
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
Block user comment
Status: Assign to:
Bug Type:
From: sdamir at gmail dot com
New email:
PHP Version: OS:


 [2006-12-05 15:48 UTC] sdamir at gmail dot com
I am trying to match all alphabetic utf8 characters. I know (tested) that in perl if $string is utf8 encoded and if i use regex like =~ /\w/ it will match all alphabetic utf8 characters, (cirilic alphabet, chinese, english etc.). However this is not the case for php. I read i need to use special patterns like \pL , well this doesn't work for me either, it matches some characters but cirilic letters aren't matched. I don't know if this is a bug or i am doing something wrong but i really searched the hell out of everything, visited tons of irc support channels no one has an answer to this.

Reproduce code:

// setlocale(LC_ALL, 'en_US.utf8'); // if i set locale to en_US, it matches some characters like ??? but not rilic, en_US.utf8 wont match anything.

$str=" &#1057;&#1088;&#1077;&#1115;&#1072; ";


preg_match("/[\w\pL]/u",$str, $r); 


Expected result:
string(3) " s "
array(1) {
  string(1) "&#1057;"

Actual result:
string(12) " &#1057;&#1088;&#1077;&#1115;&#1072; "
array(0) {


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2006-12-05 15:51 UTC] sdamir at gmail dot com
I dont know why but your bug-system converted letters in my php code into &#crap; stuff.
 [2006-12-05 16:09 UTC]

utf8_encode -- Encodes an **ISO-8859-1** string to UTF-8
 [2006-12-05 17:32 UTC] sdamir at gmail dot com
That's...not what i want, i don't see how that is relevant because string is already utf8 encoded. My question is simple, how do you match all UTF8 alphabetic characters?
 [2006-12-05 17:33 UTC] sdamir at gmail dot com
Damn, i just realised i used "utf8_encode($str);" in source code i submitted, please ignore that line!
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Feb 22 18:01:37 2024 UTC