php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #38860 preg_match returns wrong positions of matched substrings when UTF-8 is used
Submitted: 2006-09-17 14:29 UTC Modified: 2006-09-18 15:39 UTC
From: me at andr dot biz Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 4.4.4 OS: freebsd 6.1
Private report: No CVE-ID: None
 [2006-09-17 14:29 UTC] me at andr dot biz
Description:
------------
/*Guys! Your bugtracking software hase problems with cyrillic symbols. I encoded all my words in example into HTML-entities. Please decode them to UTF-8 before reproducing */

It processes UTF-8 strings incorrectly.

Firstly, please see the article:http://www.phpwact.org/php/i18n/utf-8 This article has so many described PHP bugst related to UTF-8. I think that ALL theese bugs must be fixed ASAP.

Reproduce code:
---------------
<pre>
<?php

if (!setlocale(LC_ALL,"ru_RU.UTF-8")) die('error');

$textUtf8 = "&#1087;&#1088;&#1080;&#1074;&#1077;&#1090; &#1072;&#1085;&#1076;&#1088;&#1077;&#1081;";
preg_match('/&#1072;&#1085;&#1076;&#1088;&#1077;&#1081;/u', $textUtf8, $matches, PREG_OFFSET_CAPTURE);

print_r($matches);

?>
</pre>

Expected result:
----------------
Array
(
    [0] => Array
        (
            [0] => &#1072;&#1085;&#1076;&#1088;&#1077;&#1081;
            [1] => 7
        )

)

Actual result:
--------------
Array
(
    [0] => Array
        (
            [0] => &#1072;&#1085;&#1076;&#1088;&#1077;&#1081;
            [1] => 13
        )

)

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-09-18 15:13 UTC] me at andr dot biz
I tried this: 
For Windows:
http://snaps.php.net/win32/php4-win32-STABLE-latest.zip

It return the same (wrong) result.
 [2006-09-18 15:22 UTC] tony2001@php.net
Please upload the reproduce code somewhere in the net and paste the URL here.
 [2006-09-18 15:29 UTC] me at andr dot biz
Please download 

http://flybb.ru/pcre_bug.phps
 [2006-09-18 15:39 UTC] tony2001@php.net
Not PHP problem.
Please report it to PCRE lib developers.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri May 17 23:01:32 2024 UTC