php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #79213 mb_check_encoding doesn't return true with valid base64 strings
Submitted: 2020-02-02 21:54 UTC Modified: 2020-02-03 06:05 UTC
From: info at ioweb dot gr Assigned:
Status: Verified Package: mbstring related
PHP Version: 7.2.27 OS: Debian 9, Ubuntu 18.04
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2020-02-02 21:54 UTC] info at ioweb dot gr
Description:
------------
mb_check_encoding is unable to detect strings that are encoded with base64, it will show false

Example string "test" will yield "dGVzdA=="

but mb_check_encoding fails to detect correctly this is a base64 encoded string



Test script:
---------------
$base64encoded = base64_encode("test");
$valid = mb_check_encoding($base64encoded, "BASE64");


Expected result:
----------------
$valid is true

Actual result:
--------------
$valid is false

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-02-02 22:05 UTC] bugreports at gmail dot com
why do you expect base64 strings to be detected as whatever multibyte encoding?

it's a plain string - not more and not less and "BASE64" is not a multibyte encoding at all
 [2020-02-02 22:13 UTC] info at ioweb dot gr
From reading the documentation I reached this assumption

In  the docs it says for mb_check_encoding

Checks if the specified byte stream is valid for the specified encoding.

A list of supported encodings I can get from

mb_list_encodings which will also include base64 

So I would assume mb_check_encoding would be able to handle all encodings that mb_list_encodings shows
 [2020-02-02 22:31 UTC] requinix@php.net
-Summary: mb_check_encoding doesn't work with base64 +Summary: mb_check_encoding doesn't return true with valid base64 padded string -Status: Open +Status: Verified -Package: Unknown/Other Function +Package: mbstring related
 [2020-02-02 22:31 UTC] requinix@php.net
I believe the issue is that mb_check_encoding() works on the assumption the input comes from a stream, meaning there may be more input to read after what was passed to it. "test" encodes to "dGVzdA==" which is a valid base-64 string, however =s are strictly used as padding at the end of the output and so cannot appear in the middle of a stream.

As such, if you rtrim the =s off then mb_check_encoding will report the "dGVzdA" is valid. But then that puts the responsibility of validating the number of =s on the user.

However, I don't think that makes this behavior correct. If it were reading a UTF-8 stream and the string ended in the middle of a multibyte sequence then it should report as valid: there may be more to read that would complete the sequence. Valid "so far". The case of a padded base-64 string is different as the string ended in a valid state - it would be invalid if more bytes followed, but failing because there *might* be more isn't good.


Would be nice if ext/mbstring provided a true stateful validator. Like a class that you feed stream input as it arrives and can query for the current validation state, or a procedural function that returns some state identifier that you pass back to it with subsequent calls.
 [2020-02-02 22:43 UTC] nikic@php.net
> However, I don't think that makes this behavior correct. If it were reading a UTF-8 stream and the string ended in the middle of a multibyte sequence then it should report as valid.

No, this is not how mb_check_encoding() is supposed to work. The string as a whole must be valid, not just a valid prefix. (Unfortunately many flush implementations in mbfl are somewhat broken, so theory and practice may not align very well.)
 [2020-02-02 22:50 UTC] requinix@php.net
Ah shoot, you're right. It's mb_detect_encoding that works on partial strings, not mb_check_encoding.
 [2020-02-03 05:15 UTC] info at ioweb dot gr
I'd like to mention that For word "δοκιμή" the encoded string is zrTOv866zrnOvM6u and the result is false as well. It's not the padded = signs only.
 [2020-02-03 05:18 UTC] requinix@php.net
@info: Yes, I was wrong about the nature of the bug, and the code I checked my response with also happened to be wrong (and in a way that made it look like mb_check_encoding was working). You can disregard most of what I said :(
 [2020-02-03 05:27 UTC] info at ioweb dot gr
No problem. I just saw that the title of the bug changed and it's about padded strings while it's failing for all base64 strings. A bit misleading
 [2020-02-03 06:05 UTC] requinix@php.net
-Summary: mb_check_encoding doesn't return true with valid base64 padded string +Summary: mb_check_encoding doesn't return true with valid base64 strings
 [2020-02-03 06:05 UTC] requinix@php.net
I suppose it could use another small update.
 
PHP Copyright © 2001-2020 The PHP Group
All rights reserved.
Last updated: Wed Oct 21 01:01:23 2020 UTC