php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #41588 preg_match offset is in bytes even in unicode mode
Submitted: 2007-06-04 13:04 UTC Modified: 2007-08-17 03:00 UTC
From: spam02 at pornel dot net Assigned:
Status: Closed Package: Documentation problem
PHP Version: 6.0.0-dev (20070509) OS: *
Private report: No CVE-ID: None
 [2007-06-04 13:04 UTC] spam02 at pornel dot net
Description:
------------
preg_match() with 'u' modifier is supposed to use UTF-8, but this switch doesn't affect offset parameter, which is always in bytes.

This gotcha at least deserves to be documented, although consistent unicode support would be even better.


Reproduce code:
---------------
<?php
preg_match('/./u',urldecode('%C2%AE').'NY',$m,NULL,2);
echo $m[0];


Expected result:
----------------
Y

Actual result:
--------------
N

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-06-04 13:08 UTC] spam02 at pornel dot net
(fixed php version)
 [2007-06-04 13:18 UTC] tony2001@php.net
>preg_match() with 'u' modifier is supposed to use UTF-8, but this
>switch doesn't affect offset parameter, which is always in bytes.

Right, PHP is not supposed to parse the regexp to detect which modifiers were used.
The byte/codepoint behaviour changes only in Unicode mode.
 [2007-08-17 03:00 UTC] vrana@php.net
This bug has been fixed in the documentation's XML sources. Since the
online and downloadable versions of the documentation need some time
to get updated, we would like to ask you to be a bit patient.

Thank you for the report, and for helping us make our documentation better.

$offset: "(in bytes)"
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed Apr 24 22:01:30 2024 UTC