php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #41588 preg_match offset is in bytes even in unicode mode
Submitted: 2007-06-04 13:04 UTC Modified: 2007-08-17 03:00 UTC
From: spam02 at pornel dot net Assigned:
Status: Closed Package: Documentation problem
PHP Version: 6.0.0-dev (20070509) OS: *
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: spam02 at pornel dot net
New email:
PHP Version: OS:

 

 [2007-06-04 13:04 UTC] spam02 at pornel dot net
Description:
------------
preg_match() with 'u' modifier is supposed to use UTF-8, but this switch doesn't affect offset parameter, which is always in bytes.

This gotcha at least deserves to be documented, although consistent unicode support would be even better.


Reproduce code:
---------------
<?php
preg_match('/./u',urldecode('%C2%AE').'NY',$m,NULL,2);
echo $m[0];


Expected result:
----------------
Y

Actual result:
--------------
N

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-06-04 13:08 UTC] spam02 at pornel dot net
(fixed php version)
 [2007-06-04 13:18 UTC] tony2001@php.net
>preg_match() with 'u' modifier is supposed to use UTF-8, but this
>switch doesn't affect offset parameter, which is always in bytes.

Right, PHP is not supposed to parse the regexp to detect which modifiers were used.
The byte/codepoint behaviour changes only in Unicode mode.
 [2007-08-17 03:00 UTC] vrana@php.net
This bug has been fixed in the documentation's XML sources. Since the
online and downloadable versions of the documentation need some time
to get updated, we would like to ask you to be a bit patient.

Thank you for the report, and for helping us make our documentation better.

$offset: "(in bytes)"
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Apr 23 13:01:29 2024 UTC