php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #77025 mb_strpos throws Unknown encoding or conversion error
Submitted: 2018-10-16 22:01 UTC Modified: 2018-10-17 13:29 UTC
From: vs-php-bugs at schukai dot com Assigned: nikic (profile)
Status: Closed Package: mbstring related
PHP Version: 7.3.0RC3 OS: debian/stretch
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: vs-php-bugs at schukai dot com
New email:
PHP Version: OS:

 

 [2018-10-16 22:01 UTC] vs-php-bugs at schukai dot com
Description:
------------
the following code works as expected

php7.2 -r "echo \mb_stripos('Hello', 'e', 0, '8bit');"
> 1

in 7.3, however, a error is thrown

php7.3 -r "echo \mb_strpos('Hello', 'e', 0, '8bit');"
> php error

alternatively, mb_stripos does not write any output

php7.3 -r "echo \mb_stripos('Hello', 'e', 0, '8bit');"
> empty result

the following code works with iso-8859-1

php7.3 -r "echo \mb_stripos('Hello', 'e', 0, 'iso-8859-1');"


php7.3 -r "print_r(mb_list_encodings());" contains 8bit and iso-8859-1

Array
(
    [0] => pass
    [1] => auto
    [2] => wchar
    [3] => byte2be
    [4] => byte2le
    [5] => byte4be
    [6] => byte4le
    [7] => BASE64
    [8] => UUENCODE
    [9] => HTML-ENTITIES
    [10] => Quoted-Printable
    [11] => 7bit
    [12] => 8bit
    [13] => UCS-4
    [14] => UCS-4BE

7.2/php.ini and 7.3/php.ini are identical


PHP-Version

PHP 7.3.0RC3 (cli) (built: Oct 15 2018 12:05:08) ( NTS )
Copyright (c) 1997-2018 The PHP Group
Zend Engine v3.3.0-dev, Copyright (c) 1998-2018 Zend Technologies
    with Zend OPcache v7.3.0RC3, Copyright (c) 1999-2018, by Zend Technologies
    with Xdebug v2.7.0beta1, Copyright (c) 2002-2018, by Derick Rethans

Test script:
---------------
echo \mb_strpos('Hello', 'e', 0, '8bit');

Expected result:
----------------
1

Actual result:
--------------
PHP Warning:  mb_strpos(): Unknown encoding or conversion error in Command line code on line 1
PHP Stack trace:
PHP   1. {main}() Command line code:0
PHP   2. mb_strpos() Command line code:1

Warning: mb_strpos(): Unknown encoding or conversion error in Command line code on line 1

Call Stack:
    0.0008     406848   1. {main}() Command line code:0
    0.0008     406848   2. mb_strpos() Command line code:1


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-10-17 07:47 UTC] cmb@php.net
-Status: Open +Status: Verified
 [2018-10-17 07:47 UTC] cmb@php.net
Confirmed: <https://3v4l.org/ak195>
 [2018-10-17 09:13 UTC] nikic@php.net
-Status: Verified +Status: Assigned -Assigned To: +Assigned To: nikic
 [2018-10-17 10:00 UTC] nikic@php.net
I'm wondering what the meaning of the 8bit encoding is actually supposed to be. Right now, it essentially seems to be behave the same as ISO-8859-1. It maps code units \x00-\xFF to Unicode codepoints U+0 - U+FF, which will then also be subject to case-conversions as such:

$A = "\xc4"; // Ä in ISO-8859-1
$a = "\xe4"; // ä in ISO-8859-1
echo mb_stripos($a, $A, 0, '8bit'); // 0

Is it supposed to work that way?
 [2018-10-17 10:41 UTC] nikic@php.net
Automatic comment on behalf of nikita.ppv@gmail.com
Revision: http://git.php.net/?p=php-src.git;a=commit;h=56665a1b17b8877164473872b90b68d5ac311306
Log: Fixed bug #77025
 [2018-10-17 10:41 UTC] nikic@php.net
-Status: Assigned +Status: Closed
 [2018-10-17 13:29 UTC] vs-php-bugs at schukai dot com
yes, that seems to be the case at 7.2.

$A = "\xc4"; // Ä in ISO-8859-1
$a = "\xe4"; // ä in ISO-8859-1
echo mb_stripos($a, $A, 0, '8bit'); // 0


I think that at 8bit (strpos) no conversion should take place:

echo mb_strpos($a, $A, 0, '8bit'); // false
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 15:01:30 2024 UTC