php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #80870 base64_decode: Support a "stream" of multiple encoded strings
Submitted: 2021-03-15 20:28 UTC Modified: 2021-06-05 19:07 UTC
Votes:1
Avg. Score:3.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: bugs at jth dot net Assigned:
Status: Open Package: Strings related
PHP Version: 7.4.16 OS: Linux
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2021-03-15 20:28 UTC] bugs at jth dot net
Description:
------------
When decoding a string containing two or more base64 blocks with line feeds in the base64 text the latter blocks are corrupted.

This may occur as input from an email generated by some mail programs.

Test script:
---------------
$s1 = base64_encode("Line nr 1 bcdefg\n");
$s2 = base64_encode("Line nr 2 \n");
$str = $s1."\n".$s2;
echo $str
echo base64_decode($str);



Expected result:
----------------
TGluZSBuciAxIGJjZGVmZwo=
TGluZSBuciAyIAo=

Line nr 1 bcdefg
Line nr 2 




Actual result:
--------------
Result:
TGluZSBuciAxIGJjZGVmZwo=
TGluZSBuciAyIAo=

Line nr 1 bcdefg
(corrupted text)

Patches

base64_encode_testscript (last revision 2021-04-13 15:03 UTC by bugs at jth dot net)
base64_patch (last revision 2021-04-13 14:55 UTC by bugs at jth dot net)

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2021-03-15 22:20 UTC] requinix@php.net
-Status: Open +Status: Feedback -Package: Unknown/Other Function +Package: Strings related
 [2021-03-15 22:20 UTC] requinix@php.net
= is not valid for Base64 when located in the middle of an input string. As such it will be ignored (since $strict is not being used) and PHP will attempt to decode "TGluZSBuciAxIGJjZGVmZwoTGluZSBuciAyIAo=".
The line feed is irrelevant.

What is this "may occur as input from an email"?
 [2021-03-16 16:48 UTC] bugs at jth dot net
-Status: Feedback +Status: Open
 [2021-03-16 16:48 UTC] bugs at jth dot net
The case is:

The email program will produce the email body as a valid base64 block and concatenate the selected signature as a separate valid base64 block. Thus the email body will consist of two valid base64 blocks.

Using the linux base64 program this body is decoded correctly.

As we are using a PHP script to interpret emails from different sources, this base64_decode behaviour is highly inconvenient and I don't see, why it cannot decode consequtive base64 blocks correctly instead of the current partly corrupted way.
 [2021-03-16 17:32 UTC] dgdgeg dot evrgrh at ggg dot ag
your mail client properly deals with mime-mail which is *highly* complex especially when it comes to nesting
 [2021-03-16 18:59 UTC] bugs at jth dot net
Please, understand, that we are handling emails sent to a server from the outside world composed by a variety of email clients out of our control. The emails are handled by a PHP program handling incoming emails as input to a special service.

Please, relate to the fact that the linux base64 program is handling this case correctly, whereas base64_decode is corrupting data. There is absolutely no reason for this difference. Correcting it does not cause any backwards compatibility problems, as I assume nobody is using base64_decode for generating partly corrupt data.
 [2021-03-16 19:05 UTC] bugs at jth dot net
> Please, understand, that we are handling emails 
> sent to a server from the outside world composed 
> by a variety of email clients out of our control

please understand that everybody dealing with email has that issue and solving that properly is far outside of a naive usage of base64 / string functions

it's far outside of the scope of a programming language 

you are wrong here - look how mail-clients linke rouncube written in php deal with emails! again: you are wrong here, base64_decode alone can't handle emails, period
 [2021-03-16 19:33 UTC] bugs at jth dot net
Please, forget the source, and relate to the problem.

Linux base64 is handling decoding correctly.

base64_decode is not handling it correctly, but corrupting data.
 [2021-03-16 20:13 UTC] iguv dot jdzx at ghjj dot sddr
again: read the source of roundcubemail which is written in php and solved your issues
 [2021-03-17 16:27 UTC] cmb@php.net
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2021-03-17 16:27 UTC] cmb@php.net
base64_encode() is supposed to decode a base64 encoded string. Two
concatenated base64 encoded strings are not a base64 encoded
string.
 [2021-03-17 20:32 UTC] bugs at jth dot net
The base64_decode is handling the padding character '=' incorrectly.
It can decode concatenated blocks if the statement 
i = 0;
is inserted into the code in a proper place.
However, adding a few more lines will handle it properly and I am currently testing this.
 [2021-03-17 20:34 UTC] rtrtrtrtrt at dfdfdfdf dot dfd
argue whatever you want: "Concatenated base64 blocks" are idiotic
 [2021-03-17 20:50 UTC] bugs at jth dot net
I may agree, but I am relating to a world outside my control, not an ideal world.
 [2021-03-17 23:18 UTC] requinix@php.net
-Summary: base64_decode: Concatenated base64 blocks are corrupted +Summary: base64_decode: Support a "stream" of multiple encoded strings -Status: Not a bug +Status: Open -Type: Bug +Type: Feature/Change Request -Assigned To: cmb +Assigned To:
 [2021-04-13 14:55 UTC] bugs at jth dot net
The following patch has been added/updated:

Patch Name: base64_patch
Revision:   1618325738
URL:        https://bugs.php.net/patch-display.php?bug=80870&patch=base64_patch&revision=1618325738
 [2021-04-13 15:03 UTC] bugs at jth dot net
The following patch has been added/updated:

Patch Name: base64_encode_testscript
Revision:   1618326224
URL:        https://bugs.php.net/patch-display.php?bug=80870&patch=base64_encode_testscript&revision=1618326224
 [2021-04-13 15:08 UTC] bugs at jth dot net
I just uploaded a patch as well as a test script (for 8.0.3)
There should be no change in the old behaviour for single base64 blocks.
 [2021-06-05 12:23 UTC] a at b dot c dot de
What if the first block isn't padded with '=' because the data it's encoding is a multiple of three bytes in length?

$s1 = base64_encode("Line nr 1 bcdefgh\n");
$s2 = base64_encode("Line nr 2 \n");
$str = $s1."\n".$s2;
echo $str;
echo base64_decode($str);
 [2021-06-05 19:07 UTC] bugs at jth dot net
No problem. One might add these test cases to the test script uploaded to check for this case.

'Line nr 3ab Line nr 2a Line nr 1 :TGluZSBuciAzYWIg TGluZSBuciAyYSA= TGluZSBuciAxIA==',
'Line nr 3ab Line nr 3ab Line nr 1 :TGluZSBuciAzYWIg TGluZSBuciAzYWIg TGluZSBuciAxIA==',

CODED=TGluZSBuciAzYWIg TGluZSBuciAyYSA= TGluZSBuciAxIA==
 UNCODED=Line nr 3ab Line nr 2a Line nr 1
 UNCODEDSTRICT=Line nr 3ab Line nr 2a Line nr 1
CODED=TGluZSBuciAzYWIg TGluZSBuciAzYWIg TGluZSBuciAxIA==
 UNCODED=Line nr 3ab Line nr 3ab Line nr 1
 UNCODEDSTRICT=Line nr 3ab Line nr 3ab Line nr 1
 [2021-06-06 23:42 UTC] a at b dot c dot de
So the job of could be done with a few lines of PHP code; if you want to play code golf you could get it in under 100 characters, but for real-world usage a few lines would be better.
 
PHP Copyright © 2001-2021 The PHP Group
All rights reserved.
Last updated: Wed Aug 04 12:01:24 2021 UTC