php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #43841 mb_strrpos offset is byte count for negative values
Submitted: 2008-01-14 16:38 UTC Modified: 2009-02-15 07:14 UTC
From: jmessa@php.net Assigned: hirokawa (profile)
Status: Closed Package: mbstring related
PHP Version: 5.2CVS-2008-01-14 (snap) OS: Windows XP
Private report: No CVE-ID: None
 [2008-01-14 16:38 UTC] jmessa@php.net
Description:
------------
The offset argument appears to do a byte count for negative values of offset. 
In the example below, $string_ascii is 21 characters long and $string_mb is 21 characters (53 bytes) long. In both cases the needle appears twice, first at position 9 and secondly at position 20. 
When the offset is -24, beyond the character length of the string, it finds $needle at position 9, when $needle would be expected to be found when offest is -12 (i.e. behave the same as the ASCII example).

It's also worth noting that strrpos returns a notice when the offset is outside the boundary of the string whereas mb_strrpos does not.

This may be linked to this bug: http://bugs.php.net/43840.

Reproduce code:
---------------
<?php
$offsets = array(-25, -24, -13, -12);
$string_mb = base64_decode('5pel5pys6Kqe44OG44Kt44K544OI44Gn44GZ44CCMDEyMzTvvJXvvJbvvJfvvJjvvJnjgII=');
$needle = base64_decode('44CC');

foreach ($offsets as $i) {
	echo "\n-- Offset is $i --\n";
	echo "Multibyte String:\t";
	var_dump( mb_strrpos($string_mb, $needle, $i, 'UTF-8') );
	echo "ASCII String:\n";
	echo "mb_strrpos:\t\t";
	var_dump(mb_strrpos('This is na English ta', 'a', $i));
	echo "strrpos:\t\t";
	var_dump(strrpos('This is na English ta', 'a', $i));
}
?>

Expected result:
----------------
-- Offset is -25 --
Multibyte String:	
Notice: mb_strrpos(): Offset is greater than the length of haystack string in ...\mb_strrpos.php on line 9
bool(false)
ASCII String:
mb_strrpos:
Notice: mb_strrpos(): Offset is greater than the length of haystack string in ...\mb_strrpos.php on line 14
bool(false)
strrpos:		
Notice: strrpos(): Offset is greater than the length of haystack string in ...\mb_strrpos.php on line 14
bool(false)

-- Offset is -24 --
Multibyte String:	
Notice: mb_strrpos(): Offset is greater than the length of haystack string in ...\mb_strrpos.php on line 9
bool(false)
ASCII String:
mb_strrpos:
Notice: mb_strrpos(): Offset is greater than the length of haystack string in ...\mb_strrpos.php on line 14
bool(false)
strrpos:		
Notice: strrpos(): Offset is greater than the length of haystack string in ...\mb_strrpos.php on line 14
bool(false)

-- Offset is -13 --
Multibyte String:	bool(false)
ASCII String:
mb_strrpos:		bool(false)
strrpos:		bool(false)

-- Offset is -12 --
Multibyte String:	int(9)
ASCII String:
mb_strrpos:		int(9)
strrpos:		int(9)


Actual result:
--------------
-- Offset is -25 --
Multibyte String:	bool(false)
ASCII String:
mb_strrpos:		bool(false)
strrpos:		
Notice: strrpos(): Offset is greater than the length of haystack string in ...\mb_strrpos.php on line 14
bool(false)

-- Offset is -24 --
Multibyte String:	int(9)
ASCII String:
mb_strrpos:		bool(false)
strrpos:		
Notice: strrpos(): Offset is greater than the length of haystack string in ...\mb_strrpos.php on line 14
bool(false)

-- Offset is -13 --
Multibyte String:	int(9)
ASCII String:
mb_strrpos:		bool(false)
strrpos:		bool(false)

-- Offset is -12 --
Multibyte String:	int(9)
ASCII String:
mb_strrpos:		int(9)
strrpos:		int(9)

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-01-30 15:58 UTC] nicholsr@php.net
assigning to maintainer
 [2008-02-10 00:31 UTC] hirokawa@php.net
Could you show me the mbstring related setting (mbstring.*)
in your php.ini ?

 [2008-02-12 09:51 UTC] jmessa@php.net
Here is the entire mbstring section of my php.ini file, I haven't
changed it from the default that comes when you download PHP.
[mbstring]
; language for internal character representation.
;mbstring.language = Japanese

; internal/script encoding.
; Some encoding cannot work as internal encoding.
; (e.g. SJIS, BIG5, ISO-2022-*)
;mbstring.internal_encoding = EUC-JP

; http input encoding.
;mbstring.http_input = auto

; http output encoding. mb_output_handler must be
; registered as output buffer to function
;mbstring.http_output = SJIS

; enable automatic encoding translation according to
; mbstring.internal_encoding setting. Input chars are
; converted to internal encoding by setting this to On.
; Note: Do _not_ use automatic encoding translation for
;       portable libs/applications.
;mbstring.encoding_translation = Off

; automatic encoding detection order.
; auto means
;mbstring.detect_order = auto

; substitute_character used when character cannot be converted
; one from another
;mbstring.substitute_character = none;

; overload(replace) single byte functions by mbstring functions.
; mail(), ereg(), etc are overloaded by mb_send_mail(), mb_ereg(),
; etc. Possible values are 0,1,2,4 or combination of them.
; For example, 7 for overload everything.
; 0: No overload
; 1: Overload mail() function
; 2: Overload str*() functions
; 4: Overload ereg*() functions
;mbstring.func_overload = 0

Thanks
 [2008-02-12 09:53 UTC] jmessa@php.net
I also thought I'd say now that I've committed a load of mbstring tests to CVS if you haven't seen them already. Let me know if you'd like anything changing in them.
Thanks!
 [2008-02-18 14:07 UTC] jmessa@php.net
I've run this test against the latest 5.2 and 5.3 snapshots and can see  that there's now bounds checking for this function and I'm getting error messages returned as with strrpos (thanks!). When $offset = -13 the multibyte string is still returning int(9) though so it looks like there is still a bug here. 
Thanks for what you've done so far
 [2008-02-26 01:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 [2008-12-05 13:44 UTC] ant@php.net
I re-tested on the latest 5.2 snap and it looks like the output still differs from the expected, I now get the following:

-- Offset is -25 --
Multibyte String:
Notice: mb_strrpos(): Offset is greater than the length of haystack string in D:\Testing\test.php on line 11
bool(false)
ASCII String:
mb_strrpos:
Notice: mb_strrpos(): Offset is greater than the length of haystack string in D:\Testing\test.php on line 14
bool(false)
strrpos:
Notice: strrpos(): Offset is greater than the length of haystack string in D:\Testing\test.php on line 16
bool(false)

-- Offset is -24 --
Multibyte String:
Notice: mb_strrpos(): Offset is greater than the length of haystack string in D:\Testing\test.php on line 11
bool(false)
ASCII String:
mb_strrpos:
Notice: mb_strrpos(): Offset is greater than the length of haystack string in D:\Testing\test.php on line 14
bool(false)
strrpos:
Notice: strrpos(): Offset is greater than the length of haystack string in D:\Testing\test.php on line 16
bool(false)

-- Offset is -13 --
Multibyte String:       int(9)
ASCII String:
mb_strrpos:             bool(false)
strrpos:                bool(false)

-- Offset is -12 --
Multibyte String:       int(9)
ASCII String:
mb_strrpos:             int(9)
strrpos:                int(9)
 [2009-02-15 07:14 UTC] moriyoshi@php.net
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.


 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 08:01:29 2024 UTC