php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #60603 PCRE functions and encoding UTF-8
Submitted: 2011-12-23 23:13 UTC Modified: 2011-12-25 16:28 UTC
From: post at kira-bianca dot de Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 5.3.8 OS: Windows XP prof. V2002 with SP3
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: post at kira-bianca dot de
New email:
PHP Version: OS:

 

 [2011-12-23 23:13 UTC] post at kira-bianca dot de
Description:
------------
Hi!

My HTML source I write in the encoding UTF-8 and I have probably found a bug (... soon cost me two hours of searching and working for a flaw in my logic).

And indeed, I want with preg_match() a string to check for valid characters and simultaneously for a minimum length. In the string should german special characters (Ä, Ö, Ü, ß) also be allowed. As I noted, these are all PCRE functions treated as two letters, if the encoding is UTF-8. Examples see below.

For tips on how I can work around this problem, I would be grateful.

greeting
Ki-Bi

Test script:
---------------
/*
   remark:
   The source is saved with encoding UTF-8 (without BOM)
*/
$animal = 'Bär';
if (preg_match('~^[a-zäöüß\.\- ]{4,26}$~i', $animal)){
		echo "$animal has a word length from 4 to 26 letters.\n";
	}else{
		echo "$animal is too short or too long!\n";}
$string = 'Dörte hütet die Schäfchen und trinkt dabei ein Maß Bier. Ärger gibt es dabei nicht.';
echo preg_replace('~[äöüß]~i', '*', $string);

Expected result:
----------------
Bär has a word length from 4 to 26 letters.
D**rte h**tet die Sch**fchen und trinkt dabei ein Ma** Bier. *�rger gibt es dabei nicht.


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2011-12-24 13:34 UTC] cataphract@php.net
-Status: Open +Status: Bogus
 [2011-12-24 13:34 UTC] cataphract@php.net
Sorry, but your problem does not imply a bug in PHP itself.  For a
list of more appropriate places to ask for help using PHP, please
visit http://www.php.net/support.php as this bug system is not the
appropriate forum for asking support questions.  Due to the volume
of reports we can not explain in detail here why your report is not
a bug.  The support channels will be able to provide an explanation
for you.

Thank you for your interest in PHP.


 [2011-12-25 16:28 UTC] post at kira-bianca dot de
I don't understand, why that should not be a bug. If ONE letter is counted as TWO letters, that's for me quite an obvious bug. Maybe ​​my final sentence the thing made ambiguous.

Ok, I will not discuss it further. A discussion forum is not really here.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 26 12:01:30 2024 UTC