|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2010-08-12 14:36 UTC] pj at ezgr dot net
Description:
------------
PHP: 5.2.14, Apache 2.2.15, mod_php
While \s is supposed to match all whitespace, the greek unicode letter Pi (Π) whose code is 0xCEA0 is matched too and if replaced with something, it's stripped of its second byte (0xA0).
Test script:
---------------
<?php
mb_internal_encoding('UTF-8');
$testStr = 'Π Π Π!';
$newStr = mb_ereg_replace('\s+','_',$testStr);
echo $testStr;
echo $newStr;
echo urlencode($testStr);
echo urlencode($newStr);
?>
Expected result:
----------------
Π Π Π!
Π__Π__Π!
%CE%A0++%CE%A0++%CE%A0%21
%CE%A0__%CE%A0__%CE%A0%21
Actual result:
--------------
Π Π Π!
[non printable character]_[non printable character]_[non printable character]!
%CE%A0++%CE%A0++%CE%A0%21
%CE_%CE_%CE_%21
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Thu Oct 30 15:00:01 2025 UTC |
You need to also call mb_regex_encoding('UTF-8'); before using a UTF-8 regular expression.