php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #38860 preg_match returns wrong positions of matched substrings when UTF-8 is used
Submitted: 2006-09-17 14:29 UTC Modified: 2006-09-18 15:39 UTC
From: me at andr dot biz Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 4.4.4 OS: freebsd 6.1
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: me at andr dot biz
New email:
PHP Version: OS:

 

 [2006-09-17 14:29 UTC] me at andr dot biz
Description:
------------
/*Guys! Your bugtracking software hase problems with cyrillic symbols. I encoded all my words in example into HTML-entities. Please decode them to UTF-8 before reproducing */

It processes UTF-8 strings incorrectly.

Firstly, please see the article:http://www.phpwact.org/php/i18n/utf-8 This article has so many described PHP bugst related to UTF-8. I think that ALL theese bugs must be fixed ASAP.

Reproduce code:
---------------
<pre>
<?php

if (!setlocale(LC_ALL,"ru_RU.UTF-8")) die('error');

$textUtf8 = "&#1087;&#1088;&#1080;&#1074;&#1077;&#1090; &#1072;&#1085;&#1076;&#1088;&#1077;&#1081;";
preg_match('/&#1072;&#1085;&#1076;&#1088;&#1077;&#1081;/u', $textUtf8, $matches, PREG_OFFSET_CAPTURE);

print_r($matches);

?>
</pre>

Expected result:
----------------
Array
(
    [0] => Array
        (
            [0] => &#1072;&#1085;&#1076;&#1088;&#1077;&#1081;
            [1] => 7
        )

)

Actual result:
--------------
Array
(
    [0] => Array
        (
            [0] => &#1072;&#1085;&#1076;&#1088;&#1077;&#1081;
            [1] => 13
        )

)

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-09-18 15:13 UTC] me at andr dot biz
I tried this: 
For Windows:
http://snaps.php.net/win32/php4-win32-STABLE-latest.zip

It return the same (wrong) result.
 [2006-09-18 15:22 UTC] tony2001@php.net
Please upload the reproduce code somewhere in the net and paste the URL here.
 [2006-09-18 15:29 UTC] me at andr dot biz
Please download 

http://flybb.ru/pcre_bug.phps
 [2006-09-18 15:39 UTC] tony2001@php.net
Not PHP problem.
Please report it to PCRE lib developers.
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Sun Jul 27 14:00:03 2025 UTC