php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #79213 mb_check_encoding doesn't return true with valid base64 strings
Submitted: 2020-02-02 21:54 UTC Modified: 2020-02-03 06:05 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: info at ioweb dot gr Assigned:
Status: Verified Package: mbstring related
PHP Version: 7.2.27 OS: Debian 9, Ubuntu 18.04
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: info at ioweb dot gr
New email:
PHP Version: OS:

 

 [2020-02-02 21:54 UTC] info at ioweb dot gr
Description:
------------
mb_check_encoding is unable to detect strings that are encoded with base64, it will show false

Example string "test" will yield "dGVzdA=="

but mb_check_encoding fails to detect correctly this is a base64 encoded string



Test script:
---------------
$base64encoded = base64_encode("test");
$valid = mb_check_encoding($base64encoded, "BASE64");


Expected result:
----------------
$valid is true

Actual result:
--------------
$valid is false

Patches

Pull Requests

Pull requests:

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-02-02 22:05 UTC] bugreports at gmail dot com
why do you expect base64 strings to be detected as whatever multibyte encoding?

it's a plain string - not more and not less and "BASE64" is not a multibyte encoding at all
 [2020-02-02 22:13 UTC] info at ioweb dot gr
From reading the documentation I reached this assumption

In  the docs it says for mb_check_encoding

Checks if the specified byte stream is valid for the specified encoding.

A list of supported encodings I can get from

mb_list_encodings which will also include base64 

So I would assume mb_check_encoding would be able to handle all encodings that mb_list_encodings shows
 [2020-02-02 22:31 UTC] requinix@php.net
-Summary: mb_check_encoding doesn't work with base64 +Summary: mb_check_encoding doesn't return true with valid base64 padded string -Status: Open +Status: Verified -Package: Unknown/Other Function +Package: mbstring related
 [2020-02-02 22:31 UTC] requinix@php.net
I believe the issue is that mb_check_encoding() works on the assumption the input comes from a stream, meaning there may be more input to read after what was passed to it. "test" encodes to "dGVzdA==" which is a valid base-64 string, however =s are strictly used as padding at the end of the output and so cannot appear in the middle of a stream.

As such, if you rtrim the =s off then mb_check_encoding will report the "dGVzdA" is valid. But then that puts the responsibility of validating the number of =s on the user.

However, I don't think that makes this behavior correct. If it were reading a UTF-8 stream and the string ended in the middle of a multibyte sequence then it should report as valid: there may be more to read that would complete the sequence. Valid "so far". The case of a padded base-64 string is different as the string ended in a valid state - it would be invalid if more bytes followed, but failing because there *might* be more isn't good.


Would be nice if ext/mbstring provided a true stateful validator. Like a class that you feed stream input as it arrives and can query for the current validation state, or a procedural function that returns some state identifier that you pass back to it with subsequent calls.
 [2020-02-02 22:43 UTC] nikic@php.net
> However, I don't think that makes this behavior correct. If it were reading a UTF-8 stream and the string ended in the middle of a multibyte sequence then it should report as valid.

No, this is not how mb_check_encoding() is supposed to work. The string as a whole must be valid, not just a valid prefix. (Unfortunately many flush implementations in mbfl are somewhat broken, so theory and practice may not align very well.)
 [2020-02-02 22:50 UTC] requinix@php.net
Ah shoot, you're right. It's mb_detect_encoding that works on partial strings, not mb_check_encoding.
 [2020-02-03 05:15 UTC] info at ioweb dot gr
I'd like to mention that For word "δοκιμή" the encoded string is zrTOv866zrnOvM6u and the result is false as well. It's not the padded = signs only.
 [2020-02-03 05:18 UTC] requinix@php.net
@info: Yes, I was wrong about the nature of the bug, and the code I checked my response with also happened to be wrong (and in a way that made it look like mb_check_encoding was working). You can disregard most of what I said :(
 [2020-02-03 05:27 UTC] info at ioweb dot gr
No problem. I just saw that the title of the bug changed and it's about padded strings while it's failing for all base64 strings. A bit misleading
 [2020-02-03 06:05 UTC] requinix@php.net
-Summary: mb_check_encoding doesn't return true with valid base64 padded string +Summary: mb_check_encoding doesn't return true with valid base64 strings
 [2020-02-03 06:05 UTC] requinix@php.net
I suppose it could use another small update.
 [2023-08-22 10:09 UTC] ady7788 at yahoo dot mail dot com
The following pull request has been associated:

Patch Name: Ignore externally managed and generated files
On GitHub:  https://github.com/php/web-windows/pull/21
Patch:      https://github.com/php/web-windows/pull/21.patch
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 16:01:28 2024 UTC