PHP :: Bug #66507 :: string replace does not work on certain points on the string

Bug #66507	string replace does not work on certain points on the string
Submitted:	2014-01-17 08:59 UTC	Modified:	2014-01-17 10:31 UTC
From:	hugo at domibay dot es	Assigned:
Status:	Not a bug	Package:	mbstring related
PHP Version:	5.4.24	OS:	Centos 6.4
Private report:	No	CVE-ID:	None

View Developer Edit

Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.

php.net Username: php.net Password:

Quick Fix:	(description)
	Block user comment
Status:		Assign to:
Package:
Bug Type:
Summary:
From:	hugo at domibay dot es
New email:
PHP Version:		OS:

New/Additional Comment:

[2014-01-17 08:59 UTC] hugo at domibay dot es

Description:
------------
I want to introduce certain chinese text into a database. The SQL Query requires that all apostrophes are escaped.
Like "'" -> "\\'" .
As I want to process chinese and other languages that don't use latin characters I opted for a string replace with "mb_ereg_replace()".
I found that "mb_ereg_replace()" could replace latin character names within the Text and also apostrophes between latin character passages but was unable to do that when a chinese character was just before the apostrophes.
Like "位于't Goor Park" -> "位于\\'t Goor Park"
I tried to use this command to achieve this. On many texts it worked, but on this text it failed.
$srs = mb_ereg_replace("[\']", "\\'", $srs);

Curious was that I can detect the apostrophe with "mb_strpos()", but I can't replace it.
$iapops = mb_strpos($srs, "'");

Test script:
---------------
This Script shows that some passages can be replaces but that important apostrophe can't be touched, but yes it can be detected.

$iapops = mb_strpos($srs, "'");

if($iapops !== false)
{
  echo "apostrophe found on '$iapops'\n";

  echo "test 0: '$srs'\n";

  $srs = mb_ereg_replace("(goor)", "[GREEN]", $srs, "i");

  echo "test 1: '$srs'\n";

  $srs = mb_ereg_replace("[\']", "[apostrophe]", $srs);

  echo "test 2: '$srs'\n";

  if(mb_strpos($srs, "'") !== false)
    echo "replace failed!";

}  //if($iapops !== false)

Expected result:
----------------
<p><b>酒店位置</b> <br />Hotel - Restaurant Het Ros van Twente位于de Lutte，位于[apostrophe]t [GREEN] Park和Huize Keizer Museum附近。 该 4 星级酒店位于Sandstone Museum of Bad Bentheim和本特海姆城堡地区。</p><p><b>客房</b> <br />
酒店有 30 间客房，提供平板电视。客房设有私人阳台。所提供的卫星电视可满足您的娱乐需求。便利设施包括直拨电话，以及保险>
箱和书桌。</p><p><b>休闲、SPA、高端服务设施</b> <br />享受桑拿等度假设施，或者到花园欣赏美景。</p><p><b>餐饮</b> <br />您可以到餐厅享用一顿美餐；也可以选择酒店的限时客房服务。欢迎光临酒吧/酒廊，点一杯喜欢的饮品，畅饮一番。</p><p><b>商
务及其他服务设施</b> <br />特色服务/设施包括会讲多种语言的服务员、公共区域空调和图书馆。这家酒店的活动设施包括会议室>
、小会议室和宴会设施。酒店提供免费停车设施。</p>

Actual result:
--------------
<p><b>酒店位置</b> <br />Hotel - Restaurant Het Ros van Twente位于de Lutte，位于't [GREEN] Park和Huize Keizer Museum附近。 该 4 星级酒店位于Sandstone Museum of Bad Bentheim和本特海姆城堡地区。</p><p><b>客房</b> <br />
酒店有 30 间客房，提供平板电视。客房设有私人阳台。所提供的卫星电视可满足您的娱乐需求。便利设施包括直拨电话，以及保险
箱和书桌。</p><p><b>休闲、SPA、高端服务设施</b> <br />享受桑拿等度假设施，或者到花园欣赏美景。</p><p><b>餐饮</b> <br />您可以到餐厅享用一顿美餐；也可以选择酒店的限时客房服务。欢迎光临酒吧/酒廊，点一杯喜欢的饮品，畅饮一番。</p><p><b>商
务及其他服务设施</b> <br />特色服务/设施包括会讲多种语言的服务员、公共区域空调和图书馆。这家酒店的活动设施包括会议室>
、小会议室和宴会设施。酒店提供免费停车设施。</p>

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports

[2014-01-17 09:37 UTC] requinix@php.net

-Status: Open +Status: Feedback

[2014-01-17 09:37 UTC] requinix@php.net

Notice how mb_ereg_replace() never gave you the chance to tell it what encoding the string was in? That would be a job for mb_regex_encoding().

<?php // I ran this from a file encoded in UTF-8

$string = "位于't Goor Park";

// 位于't Goor Park
echo $string, "\n";

// 位于't Goor Park
echo mb_ereg_replace("[\']", "[apostrophe]", $string), "\n";

// 位于[apostrophe]t Goor Park
mb_regex_encoding("UTF-8"); // previous value was EUC-JP
echo mb_ereg_replace("[\']", "[apostrophe]", $string), "\n";

?>

[2014-01-17 10:12 UTC] hugo at domibay dot es

I was able to reproduce this:

$string = "位于't Goor Park";

// 位于't Goor Park
echo $string, "\n";

// expected: 位于[apostrophe]t Goor Park
echo "encoding: '" . mb_regex_encoding() . "'\n";
echo mb_ereg_replace("[\']", "[apostrophe]", $string), "\n";

// expected 位于\'t Goor Park
mb_regex_encoding("UTF-8"); // previous value was EUC-JP
echo "encoding: '" . mb_regex_encoding() . "'\n";
echo mb_ereg_replace("[\']", "\\'", $string), "\n";

Produced the Output:
位于't Goor Park
encoding: 'EUC-JP'
位于't Goor Park
encoding: 'UTF-8'
位于\'t Goor Park

So I went to my php.ini and changed the "mbstring" Section Options to:
mbstring.language = UTF-8
mbstring.internal_encoding = UTF-8
mbstring.http_output = UTF-8

Repeating then the Script gave me this new Output.
位于't Goor Park
encoding: 'UTF-8'
位于[apostrophe]t Goor Park
encoding: 'UTF-8'
位于\'t Goor Park

Thank you for your help. 
I couldn't find help on this and I was always thinking that I would have "UTF-8" as internal encoding which I actually didn't have.

[2014-01-17 10:17 UTC] hugo at domibay dot es

-Status: Feedback +Status: Closed

[2014-01-17 10:17 UTC] hugo at domibay dot es

The Solution for this unexpected Output is to check on the Configuration at the "mbstring" Section within the php.ini .

Change Default Configuration:
[mbstring]
;mbstring.language = Japanese
;mbstring.internal_encoding = EUC-JP
;mbstring.http_input = auto
;mbstring.http_output = SJIS

To this Configuration that works:
[mbstring]
mbstring.language = UTF-8
mbstring.internal_encoding = UTF-8
;mbstring.http_input = auto
mbstring.http_output = UTF-8

[2014-01-17 10:18 UTC] requinix@php.net

-Status: Closed +Status: Not a bug

[2014-01-17 10:18 UTC] requinix@php.net

Good to hear it's fixed.

[2014-01-17 10:31 UTC] hugo at domibay dot es

I might have been a bit too quick about the Configuration.

Although the other one worked for me 
this one actually might be more correct:
[mbstring]
mbstring.language = neutral
mbstring.internal_encoding = UTF-8
;mbstring.http_input = auto
mbstring.http_output = auto

	php.net \| support \| documentation \| report a bug \| advanced search \| search howto \| statistics \| random bug \| login
go to bug id or search bugs for


Copyright © 2001-2026 The PHP Group All rights reserved.	Last updated: Wed Mar 04 01:00:01 2026 UTC