php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #43840 mb_strpos bounds check is byte count rather than a character count
Submitted: 2008-01-14 16:36 UTC Modified: 2008-02-27 14:26 UTC
From: jmessa@php.net Assigned: hirokawa (profile)
Status: Closed Package: mbstring related
PHP Version: 5.2CVS-2008-01-14 (snap) OS: Windows XP
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: jmessa@php.net
New email:
PHP Version: OS:

 

 [2008-01-14 16:36 UTC] jmessa@php.net
Description:
------------
The bounds check for the offest argument in mb_strpos appears to be a byte count rather than a character count.
In the example below, $string_ascii is 21 characters long and $string_mb is 21 characters (53 bytes) long. In both cases the needle appears twice, first at position 9 and secondly at position 20. 
With the multibyte string example, when the offset is past the character count of the string it would be expected to return a warning but instead a warning is returned when offest is past the byte count.

Reproduce code:
---------------
<?php
$offsets = array(20, 21, 22, 53, 54);
$string_mb = base64_decode('5pel5pys6Kqe44OG44Kt44K544OI44Gn44GZ44CCMDEyMzTvvJXvvJbvvJfvvJjvvJnjgII=');
$needle = base64_decode('44CC');

foreach($offsets as $i) {
	echo "\n-- Offset is $i --\n";
	echo "--Multibyte String:--\n";
	var_dump( mb_strpos($string_mb, $needle, $i, 'UTF-8') );
	echo"--ASCII String:--\n";
	var_dump(mb_strpos('This is na English ta', 'a', $i));
}
?>

Expected result:
----------------
-- Offset is 20 --
--Multibyte String:--
int(20)
--ASCII String:--
int(20)

-- Offset is 21 --
--Multibyte String:--
bool(false)
--ASCII String:--
bool(false)

-- Offset is 22 --
--Multibyte String:--

Warning: mb_strpos(): Offset not contained in string. in ...\mb_strpos.php on line 9
bool(false)
--ASCII String:--

Warning: mb_strpos(): Offset not contained in string. in ...\mb_strpos.php on line 11
bool(false)

-- Offset is 53 --
--Multibyte String:--

Warning: mb_strpos(): Offset not contained in string. in ...\mb_strpos.php on line 9
bool(false)
--ASCII String:--

Warning: mb_strpos(): Offset not contained in string. in ...\mb_strpos.php on line 11
bool(false)

-- Offset is 54 --
--Multibyte String:--

Warning: mb_strpos(): Offset not contained in string. in ...\mb_strpos.php on line 9
bool(false)
--ASCII String:--

Warning: mb_strpos(): Offset not contained in string. in ...\mb_strpos.php on line 11
bool(false)


Actual result:
--------------
-- Offset is 20 --
--Multibyte String:--
int(20)
--ASCII String:--
int(20)

-- Offset is 21 --
--Multibyte String:--
bool(false)
--ASCII String:--
bool(false)

-- Offset is 22 --
--Multibyte String:--
bool(false)
--ASCII String:--

Warning: mb_strpos(): Offset not contained in string. in ...\mb_strpos.php on line 11
bool(false)

-- Offset is 53 --
--Multibyte String:--
bool(false)
--ASCII String:--

Warning: mb_strpos(): Offset not contained in string. in ...\mb_strpos.php on line 11
bool(false)

-- Offset is 54 --
--Multibyte String:--

Warning: mb_strpos(): Offset not contained in string. in ...\mb_strpos.php on line 9
bool(false)
--ASCII String:--

Warning: mb_strpos(): Offset not contained in string. in ...\mb_strpos.php on line 11
bool(false)


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-01-30 15:57 UTC] nicholsr@php.net
assigning to maintainer
 [2008-02-10 00:30 UTC] hirokawa@php.net
Could you show me the mbstring related setting (mbstring.*)
in your php.ini ?

 [2008-02-12 09:50 UTC] jmessa@php.net
Here is the entire mbstring section of my php.ini file, I haven't changed it from the default that comes when you download PHP.
[mbstring]
; language for internal character representation.
;mbstring.language = Japanese

; internal/script encoding.
; Some encoding cannot work as internal encoding.
; (e.g. SJIS, BIG5, ISO-2022-*)
;mbstring.internal_encoding = EUC-JP

; http input encoding.
;mbstring.http_input = auto

; http output encoding. mb_output_handler must be
; registered as output buffer to function
;mbstring.http_output = SJIS

; enable automatic encoding translation according to
; mbstring.internal_encoding setting. Input chars are
; converted to internal encoding by setting this to On.
; Note: Do _not_ use automatic encoding translation for
;       portable libs/applications.
;mbstring.encoding_translation = Off

; automatic encoding detection order.
; auto means
;mbstring.detect_order = auto

; substitute_character used when character cannot be converted
; one from another
;mbstring.substitute_character = none;

; overload(replace) single byte functions by mbstring functions.
; mail(), ereg(), etc are overloaded by mb_send_mail(), mb_ereg(),
; etc. Possible values are 0,1,2,4 or combination of them.
; For example, 7 for overload everything.
; 0: No overload
; 1: Overload mail() function
; 2: Overload str*() functions
; 4: Overload ereg*() functions
;mbstring.func_overload = 0

Thanks
 [2008-02-18 14:15 UTC] jmessa@php.net
I've run the above example on the latest 5.2 and 5.3 snapshots and it's behaving as I expected now. Thanks for making the change!
 [2008-02-26 01:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Sun Apr 13 00:01:28 2025 UTC