php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #41588 preg_match offset is in bytes even in unicode mode
Submitted: 2007-06-04 13:04 UTC Modified: 2007-08-17 03:00 UTC
From: spam02 at pornel dot net Assigned:
Status: Closed Package: Documentation problem
PHP Version: 6.0.0-dev (20070509) OS: *
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: spam02 at pornel dot net
New email:
PHP Version: OS:

 

 [2007-06-04 13:04 UTC] spam02 at pornel dot net
Description:
------------
preg_match() with 'u' modifier is supposed to use UTF-8, but this switch doesn't affect offset parameter, which is always in bytes.

This gotcha at least deserves to be documented, although consistent unicode support would be even better.


Reproduce code:
---------------
<?php
preg_match('/./u',urldecode('%C2%AE').'NY',$m,NULL,2);
echo $m[0];


Expected result:
----------------
Y

Actual result:
--------------
N

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-06-04 13:08 UTC] spam02 at pornel dot net
(fixed php version)
 [2007-06-04 13:18 UTC] tony2001@php.net
>preg_match() with 'u' modifier is supposed to use UTF-8, but this
>switch doesn't affect offset parameter, which is always in bytes.

Right, PHP is not supposed to parse the regexp to detect which modifiers were used.
The byte/codepoint behaviour changes only in Unicode mode.
 [2007-08-17 03:00 UTC] vrana@php.net
This bug has been fixed in the documentation's XML sources. Since the
online and downloadable versions of the documentation need some time
to get updated, we would like to ask you to be a bit patient.

Thank you for the report, and for helping us make our documentation better.

$offset: "(in bytes)"
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Fri Oct 24 07:00:01 2025 UTC