php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #52592 mb_ereg_replace and the Greek capital Pi
Submitted: 2010-08-12 14:36 UTC Modified: 2010-08-13 06:50 UTC
From: pj at ezgr dot net Assigned:
Status: Not a bug Package: mbstring related
PHP Version: 5.2.14 OS: Centos 5.5 x64
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: pj at ezgr dot net
New email:
PHP Version: OS:

 

 [2010-08-12 14:36 UTC] pj at ezgr dot net
Description:
------------
PHP: 5.2.14, Apache 2.2.15, mod_php

While \s is supposed to match all whitespace, the greek unicode letter Pi (Π) whose code is 0xCEA0 is matched too and if replaced with something, it's stripped of its second byte (0xA0).

Test script:
---------------
<?php
mb_internal_encoding('UTF-8');

$testStr = 'Π  Π  Π!';
$newStr = mb_ereg_replace('\s+','_',$testStr);
echo $testStr;
echo $newStr;
echo urlencode($testStr);
echo urlencode($newStr);
?>

Expected result:
----------------
Π  Π  Π!
Π__Π__Π!
%CE%A0++%CE%A0++%CE%A0%21
%CE%A0__%CE%A0__%CE%A0%21

Actual result:
--------------
Π  Π  Π!
[non printable character]_[non printable character]_[non printable character]!
%CE%A0++%CE%A0++%CE%A0%21
%CE_%CE_%CE_%21

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2010-08-13 06:50 UTC] aharvey@php.net
-Status: Open +Status: Bogus
 [2010-08-13 06:50 UTC] aharvey@php.net
You need to also call mb_regex_encoding('UTF-8'); before using a UTF-8 regular expression.
 
PHP Copyright © 2001-2022 The PHP Group
All rights reserved.
Last updated: Sun Jan 23 18:03:34 2022 UTC