|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2014-01-17 08:59 UTC] hugo at domibay dot es
Description:
------------
I want to introduce certain chinese text into a database. The SQL Query requires that all apostrophes are escaped.
Like "'" -> "\\'" .
As I want to process chinese and other languages that don't use latin characters I opted for a string replace with "mb_ereg_replace()".
I found that "mb_ereg_replace()" could replace latin character names within the Text and also apostrophes between latin character passages but was unable to do that when a chinese character was just before the apostrophes.
Like "位于't Goor Park" -> "位于\\'t Goor Park"
I tried to use this command to achieve this. On many texts it worked, but on this text it failed.
$srs = mb_ereg_replace("[\']", "\\'", $srs);
Curious was that I can detect the apostrophe with "mb_strpos()", but I can't replace it.
$iapops = mb_strpos($srs, "'");
Test script:
---------------
This Script shows that some passages can be replaces but that important apostrophe can't be touched, but yes it can be detected.
$iapops = mb_strpos($srs, "'");
if($iapops !== false)
{
echo "apostrophe found on '$iapops'\n";
echo "test 0: '$srs'\n";
$srs = mb_ereg_replace("(goor)", "[GREEN]", $srs, "i");
echo "test 1: '$srs'\n";
$srs = mb_ereg_replace("[\']", "[apostrophe]", $srs);
echo "test 2: '$srs'\n";
if(mb_strpos($srs, "'") !== false)
echo "replace failed!";
} //if($iapops !== false)
Expected result:
----------------
<p><b>酒店位置</b> <br />Hotel - Restaurant Het Ros van Twente位于de Lutte,位于[apostrophe]t [GREEN] Park和Huize Keizer Museum附近。 该 4 星级酒店位于Sandstone Museum of Bad Bentheim和本特海姆城堡地区。</p><p><b>客房</b> <br />
酒店有 30 间客房,提供平板电视。客房设有私人阳台。所提供的卫星电视可满足您的娱乐需求。便利设施包括直拨电话,以及保险>
箱和书桌。</p><p><b>休闲、SPA、高端服务设施</b> <br />享受桑拿等度假设施,或者到花园欣赏美景。</p><p><b>餐饮</b> <br />您可以到餐厅享用一顿美餐;也可以选择酒店的限时客房服务。欢迎光临酒吧/酒廊,点一杯喜欢的饮品,畅饮一番。</p><p><b>商
务及其他服务设施</b> <br />特色服务/设施包括会讲多种语言的服务员、公共区域空调和图书馆。这家酒店的活动设施包括会议室>
、小会议室和宴会设施。酒店提供免费停车设施。</p>
Actual result:
--------------
<p><b>酒店位置</b> <br />Hotel - Restaurant Het Ros van Twente位于de Lutte,位于't [GREEN] Park和Huize Keizer Museum附近。 该 4 星级酒店位于Sandstone Museum of Bad Bentheim和本特海姆城堡地区。</p><p><b>客房</b> <br />
酒店有 30 间客房,提供平板电视。客房设有私人阳台。所提供的卫星电视可满足您的娱乐需求。便利设施包括直拨电话,以及保险
箱和书桌。</p><p><b>休闲、SPA、高端服务设施</b> <br />享受桑拿等度假设施,或者到花园欣赏美景。</p><p><b>餐饮</b> <br />您可以到餐厅享用一顿美餐;也可以选择酒店的限时客房服务。欢迎光临酒吧/酒廊,点一杯喜欢的饮品,畅饮一番。</p><p><b>商
务及其他服务设施</b> <br />特色服务/设施包括会讲多种语言的服务员、公共区域空调和图书馆。这家酒店的活动设施包括会议室>
、小会议室和宴会设施。酒店提供免费停车设施。</p>
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Thu Oct 30 04:00:02 2025 UTC |
Notice how mb_ereg_replace() never gave you the chance to tell it what encoding the string was in? That would be a job for mb_regex_encoding(). <?php // I ran this from a file encoded in UTF-8 $string = "位于't Goor Park"; // 位于't Goor Park echo $string, "\n"; // 位于't Goor Park echo mb_ereg_replace("[\']", "[apostrophe]", $string), "\n"; // 位于[apostrophe]t Goor Park mb_regex_encoding("UTF-8"); // previous value was EUC-JP echo mb_ereg_replace("[\']", "[apostrophe]", $string), "\n"; ?>I was able to reproduce this: $string = "位于't Goor Park"; // 位于't Goor Park echo $string, "\n"; // expected: 位于[apostrophe]t Goor Park echo "encoding: '" . mb_regex_encoding() . "'\n"; echo mb_ereg_replace("[\']", "[apostrophe]", $string), "\n"; // expected 位于\'t Goor Park mb_regex_encoding("UTF-8"); // previous value was EUC-JP echo "encoding: '" . mb_regex_encoding() . "'\n"; echo mb_ereg_replace("[\']", "\\'", $string), "\n"; Produced the Output: 位于't Goor Park encoding: 'EUC-JP' 位于't Goor Park encoding: 'UTF-8' 位于\'t Goor Park So I went to my php.ini and changed the "mbstring" Section Options to: mbstring.language = UTF-8 mbstring.internal_encoding = UTF-8 mbstring.http_output = UTF-8 Repeating then the Script gave me this new Output. 位于't Goor Park encoding: 'UTF-8' 位于[apostrophe]t Goor Park encoding: 'UTF-8' 位于\'t Goor Park Thank you for your help. I couldn't find help on this and I was always thinking that I would have "UTF-8" as internal encoding which I actually didn't have.