php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #48464 No regexp match in utf8 mode
Submitted: 2009-06-03 17:50 UTC Modified: 2009-06-03 22:45 UTC
From: daniel at poradnik-webmastera dot com Assigned: nlopess (profile)
Status: Not a bug Package: PCRE related
PHP Version: 5.2.9 OS: windows xp
Private report: No CVE-ID: None
 [2009-06-03 17:50 UTC] daniel at poradnik-webmastera dot com
Description:
------------
preg_match() doesn't match string when utf-8 mode is enabled and 0xAB char ("«") is present in input. Everything works correctly when utf-8 mode is disabled.

Reproduce code:
---------------
<?php

$str = "test \xab test";

if (preg_match('/test/u', $str))
	echo 'Match';
else
	echo 'No match';

?>

Expected result:
----------------
'Match' printed

Actual result:
--------------
'No match' printed

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-06-03 20:48 UTC] nlopess@php.net
$str is not a valid UTF-8 string, and thus the pcre engine rejects it.
no bug here.
 [2009-06-03 20:49 UTC] jani@php.net
Works in HEAD.
 [2009-06-03 20:49 UTC] scottmac@php.net
Sorry, but your problem does not imply a bug in PHP itself.  For a
list of more appropriate places to ask for help using PHP, please
visit http://www.php.net/support.php as this bug system is not the
appropriate forum for asking support questions.  Due to the volume
of reports we can not explain in detail here why your report is not
a bug.  The support channels will be able to provide an explanation
for you.

Thank you for your interest in PHP.

Check preg_last_error() it will return PREG_BAD_UTF8_ERROR

\xab isn't valid UTF-8, however \xc2\xab is. It should be 2 bytes.
 [2009-06-03 21:00 UTC] jani@php.net
Nuno, how come the script says "match" with HEAD?
 [2009-06-03 21:14 UTC] nlopess@php.net
I don't have PHP 6 compiled at hand, but I assume that PHP 6 is replacing the bad char before sending the string to pcre.
Can you check if it's the case by printing $str?
 [2009-06-03 22:45 UTC] scottmac@php.net
In PHP6 \xab is a codepoint, in anything below its binary.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Apr 16 13:01:30 2024 UTC