php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #43840 mb_strpos bounds check is byte count rather than a character count
Submitted: 2008-01-14 16:36 UTC Modified: 2008-02-27 14:26 UTC
From: jmessa@php.net Assigned: hirokawa (profile)
Status: Closed Package: mbstring related
PHP Version: 5.2CVS-2008-01-14 (snap) OS: Windows XP
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: jmessa@php.net
New email:
PHP Version: OS:

 

 [2008-01-14 16:36 UTC] jmessa@php.net
Description:
------------
The bounds check for the offest argument in mb_strpos appears to be a byte count rather than a character count.
In the example below, $string_ascii is 21 characters long and $string_mb is 21 characters (53 bytes) long. In both cases the needle appears twice, first at position 9 and secondly at position 20. 
With the multibyte string example, when the offset is past the character count of the string it would be expected to return a warning but instead a warning is returned when offest is past the byte count.

Reproduce code:
---------------
<?php
$offsets = array(20, 21, 22, 53, 54);
$string_mb = base64_decode('5pel5pys6Kqe44OG44Kt44K544OI44Gn44GZ44CCMDEyMzTvvJXvvJbvvJfvvJjvvJnjgII=');
$needle = base64_decode('44CC');

foreach($offsets as $i) {
	echo "\n-- Offset is $i --\n";
	echo "--Multibyte String:--\n";
	var_dump( mb_strpos($string_mb, $needle, $i, 'UTF-8') );
	echo"--ASCII String:--\n";
	var_dump(mb_strpos('This is na English ta', 'a', $i));
}
?>

Expected result:
----------------
-- Offset is 20 --
--Multibyte String:--
int(20)
--ASCII String:--
int(20)

-- Offset is 21 --
--Multibyte String:--
bool(false)
--ASCII String:--
bool(false)

-- Offset is 22 --
--Multibyte String:--

Warning: mb_strpos(): Offset not contained in string. in ...\mb_strpos.php on line 9
bool(false)
--ASCII String:--

Warning: mb_strpos(): Offset not contained in string. in ...\mb_strpos.php on line 11
bool(false)

-- Offset is 53 --
--Multibyte String:--

Warning: mb_strpos(): Offset not contained in string. in ...\mb_strpos.php on line 9
bool(false)
--ASCII String:--

Warning: mb_strpos(): Offset not contained in string. in ...\mb_strpos.php on line 11
bool(false)

-- Offset is 54 --
--Multibyte String:--

Warning: mb_strpos(): Offset not contained in string. in ...\mb_strpos.php on line 9
bool(false)
--ASCII String:--

Warning: mb_strpos(): Offset not contained in string. in ...\mb_strpos.php on line 11
bool(false)


Actual result:
--------------
-- Offset is 20 --
--Multibyte String:--
int(20)
--ASCII String:--
int(20)

-- Offset is 21 --
--Multibyte String:--
bool(false)
--ASCII String:--
bool(false)

-- Offset is 22 --
--Multibyte String:--
bool(false)
--ASCII String:--

Warning: mb_strpos(): Offset not contained in string. in ...\mb_strpos.php on line 11
bool(false)

-- Offset is 53 --
--Multibyte String:--
bool(false)
--ASCII String:--

Warning: mb_strpos(): Offset not contained in string. in ...\mb_strpos.php on line 11
bool(false)

-- Offset is 54 --
--Multibyte String:--

Warning: mb_strpos(): Offset not contained in string. in ...\mb_strpos.php on line 9
bool(false)
--ASCII String:--

Warning: mb_strpos(): Offset not contained in string. in ...\mb_strpos.php on line 11
bool(false)


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-01-30 15:57 UTC] nicholsr@php.net
assigning to maintainer
 [2008-02-10 00:30 UTC] hirokawa@php.net
Could you show me the mbstring related setting (mbstring.*)
in your php.ini ?

 [2008-02-12 09:50 UTC] jmessa@php.net
Here is the entire mbstring section of my php.ini file, I haven't changed it from the default that comes when you download PHP.
[mbstring]
; language for internal character representation.
;mbstring.language = Japanese

; internal/script encoding.
; Some encoding cannot work as internal encoding.
; (e.g. SJIS, BIG5, ISO-2022-*)
;mbstring.internal_encoding = EUC-JP

; http input encoding.
;mbstring.http_input = auto

; http output encoding. mb_output_handler must be
; registered as output buffer to function
;mbstring.http_output = SJIS

; enable automatic encoding translation according to
; mbstring.internal_encoding setting. Input chars are
; converted to internal encoding by setting this to On.
; Note: Do _not_ use automatic encoding translation for
;       portable libs/applications.
;mbstring.encoding_translation = Off

; automatic encoding detection order.
; auto means
;mbstring.detect_order = auto

; substitute_character used when character cannot be converted
; one from another
;mbstring.substitute_character = none;

; overload(replace) single byte functions by mbstring functions.
; mail(), ereg(), etc are overloaded by mb_send_mail(), mb_ereg(),
; etc. Possible values are 0,1,2,4 or combination of them.
; For example, 7 for overload everything.
; 0: No overload
; 1: Overload mail() function
; 2: Overload str*() functions
; 4: Overload ereg*() functions
;mbstring.func_overload = 0

Thanks
 [2008-02-18 14:15 UTC] jmessa@php.net
I've run the above example on the latest 5.2 and 5.3 snapshots and it's behaving as I expected now. Thanks for making the change!
 [2008-02-26 01:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 19 06:01:29 2024 UTC