php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #77025 mb_strpos throws Unknown encoding or conversion error
Submitted: 2018-10-16 22:01 UTC Modified: 2018-10-17 13:29 UTC
From: vs-php-bugs at schukai dot com Assigned: nikic (profile)
Status: Closed Package: mbstring related
PHP Version: 7.3.0RC3 OS: debian/stretch
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: vs-php-bugs at schukai dot com
New email:
PHP Version: OS:

 

 [2018-10-16 22:01 UTC] vs-php-bugs at schukai dot com
Description:
------------
the following code works as expected

php7.2 -r "echo \mb_stripos('Hello', 'e', 0, '8bit');"
> 1

in 7.3, however, a error is thrown

php7.3 -r "echo \mb_strpos('Hello', 'e', 0, '8bit');"
> php error

alternatively, mb_stripos does not write any output

php7.3 -r "echo \mb_stripos('Hello', 'e', 0, '8bit');"
> empty result

the following code works with iso-8859-1

php7.3 -r "echo \mb_stripos('Hello', 'e', 0, 'iso-8859-1');"


php7.3 -r "print_r(mb_list_encodings());" contains 8bit and iso-8859-1

Array
(
    [0] => pass
    [1] => auto
    [2] => wchar
    [3] => byte2be
    [4] => byte2le
    [5] => byte4be
    [6] => byte4le
    [7] => BASE64
    [8] => UUENCODE
    [9] => HTML-ENTITIES
    [10] => Quoted-Printable
    [11] => 7bit
    [12] => 8bit
    [13] => UCS-4
    [14] => UCS-4BE

7.2/php.ini and 7.3/php.ini are identical


PHP-Version

PHP 7.3.0RC3 (cli) (built: Oct 15 2018 12:05:08) ( NTS )
Copyright (c) 1997-2018 The PHP Group
Zend Engine v3.3.0-dev, Copyright (c) 1998-2018 Zend Technologies
    with Zend OPcache v7.3.0RC3, Copyright (c) 1999-2018, by Zend Technologies
    with Xdebug v2.7.0beta1, Copyright (c) 2002-2018, by Derick Rethans

Test script:
---------------
echo \mb_strpos('Hello', 'e', 0, '8bit');

Expected result:
----------------
1

Actual result:
--------------
PHP Warning:  mb_strpos(): Unknown encoding or conversion error in Command line code on line 1
PHP Stack trace:
PHP   1. {main}() Command line code:0
PHP   2. mb_strpos() Command line code:1

Warning: mb_strpos(): Unknown encoding or conversion error in Command line code on line 1

Call Stack:
    0.0008     406848   1. {main}() Command line code:0
    0.0008     406848   2. mb_strpos() Command line code:1


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-10-17 07:47 UTC] cmb@php.net
-Status: Open +Status: Verified
 [2018-10-17 07:47 UTC] cmb@php.net
Confirmed: <https://3v4l.org/ak195>
 [2018-10-17 09:13 UTC] nikic@php.net
-Status: Verified +Status: Assigned -Assigned To: +Assigned To: nikic
 [2018-10-17 10:00 UTC] nikic@php.net
I'm wondering what the meaning of the 8bit encoding is actually supposed to be. Right now, it essentially seems to be behave the same as ISO-8859-1. It maps code units \x00-\xFF to Unicode codepoints U+0 - U+FF, which will then also be subject to case-conversions as such:

$A = "\xc4"; // Ä in ISO-8859-1
$a = "\xe4"; // ä in ISO-8859-1
echo mb_stripos($a, $A, 0, '8bit'); // 0

Is it supposed to work that way?
 [2018-10-17 10:41 UTC] nikic@php.net
Automatic comment on behalf of nikita.ppv@gmail.com
Revision: http://git.php.net/?p=php-src.git;a=commit;h=56665a1b17b8877164473872b90b68d5ac311306
Log: Fixed bug #77025
 [2018-10-17 10:41 UTC] nikic@php.net
-Status: Assigned +Status: Closed
 [2018-10-17 13:29 UTC] vs-php-bugs at schukai dot com
yes, that seems to be the case at 7.2.

$A = "\xc4"; // Ä in ISO-8859-1
$a = "\xe4"; // ä in ISO-8859-1
echo mb_stripos($a, $A, 0, '8bit'); // 0


I think that at 8bit (strpos) no conversion should take place:

echo mb_strpos($a, $A, 0, '8bit'); // false
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 14:01:29 2024 UTC