php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #43567 PCRE: compilation failure when using UTF-8
Submitted: 2007-12-11 17:02 UTC Modified: 2008-01-13 15:11 UTC
Votes:2
Avg. Score:3.0 ± 2.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:0 (0.0%)
From: brito_victor at yahoo dot fr Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 5.2.5 OS: Windows XP SP2
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: brito_victor at yahoo dot fr
New email:
PHP Version: OS:

 

 [2007-12-11 17:02 UTC] brito_victor at yahoo dot fr
Description:
------------
In a test script, I call preg_match() function, using the flag u, in order to test an UTF-8 regular expression with hexadecimal characters. Of course, the mbstring extension is loaded and active.

Reproduce code:
---------------
$mb = function_exists('mb_detect_encoding');
$pregutf8 = preg_match("/\xf8\xa1\xa1\xa1\xa1/u", "\xf8\xa1\xa1\xa1\xa1");

Expected result:
----------------
Returns true for both variables.

Actual result:
--------------
Returns true for $mb.
Returns the following warning message for $pregutf8: "Warning:  preg_match() [function.preg-match]: Compilation failed: invalid UTF-8 string at offset 0"

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-12-11 17:08 UTC] brito_victor at yahoo dot fr
Bug seen on Windows XP SP2.
 [2007-12-13 12:48 UTC] confins_de_l_univers at yahoo dot fr
I've got the same error on linux (Debian Etch).

But your pattern (and your subject) is not a valid utf-8 string.
I've checked that with : mb_check_encoding("\xf8\xa1\xa1\xa1\xa1", 'UTF-8')
and it returns false.

So, I think it's not a bug.
 [2007-12-13 12:54 UTC] brito_victor at yahoo dot fr
I think it is a bug because, when the script is tested within version 5.2.4, it returns true.
 [2008-01-13 15:11 UTC] nlopess@php.net
In PHP 5.2.5 we upgraded PCRE to version 7.3. This version has stricter UTF-8 string validation, and thats why the string is now rejected.
For the record this bug has nothing to do with the mbstring extension.
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Jan 15 10:01:29 2025 UTC