php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #47155 PHP 6.0 decodes base64 into incorrect uft-8 string
Submitted: 2009-01-19 22:46 UTC Modified: 2009-09-08 01:00 UTC
Votes:4
Avg. Score:3.8 ± 0.8
Reproduced:1 of 2 (50.0%)
Same Version:1 (100.0%)
Same OS:0 (0.0%)
From: lunter at interia dot pl Assigned:
Status: No Feedback Package: Unicode Engine related
PHP Version: 6CVS-2009-01-19 (snap) OS: *
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: lunter at interia dot pl
New email:
PHP Version: OS:

 

 [2009-01-19 22:46 UTC] lunter at interia dot pl
Description:
------------
Problem:
--------

Source files: http://pc44.one.pl/goorol/bugs/base64.zip

PHP 6.0 decodes base64 into incorrect uft-8 string.

If you think it is bogus, show the way to decode 'zrEgKyDOsiA9IM6z' into (unicode)string 'α + β = γ' using PHP 6.
Other programming languages and applications using utf-8 have not a problem.

--------------------------------------------------------------------------------------------

PHP 6.0 example (example_php6.php):
-----------------------------------

<?
 if(substr(phpversion(),0,1)!='6'){trigger_error('only PHP 6',E_USER_ERROR);}

// unicode.semantics = off
// unicode.runtime_encoding = iso-8859-1
// unicode.script_encoding = utf-8
// unicode.output_encoding = utf-8
// unicode.from_error_mode = U_INVALID_SUBSTITUTE
// unicode.from_error_subst_char = 3f


 $base64='zrEgKyDOsiA9IM6z';
// base64 encoded utf-8 string

 $binary=base64_decode($base64);
// binary, utf-8 bytes

 $text=unicode_decode($binary,'iso-8859-1');
// utf-8 string
// why iso-8859-x is only supported, where is raw binary option?

// or new function (binary)unicode_bytes_string->(unicode)string
// $text=bin2uni($binary);

 header('Content-Type: text/plain; charset=utf-8');
 print($text);

 print("\n\n");
 print('SHOULD BE: α + β = γ');
?>

--------------------------------------------------------------------------------------------

Solution:
---------

We can not get (unicode)string from (binary)string consists utf-8 bytes.
Imagine: converting (unicode)string<->(binary)unicode_bytes_string newer need charset infomation.

C#: System.Text.Encoding.UTF8.GetString()
Decodes a sequence of bytes from the specified byte array into a string.

PHP equivalent needed: unicode bin2uni( binary $b )
Decodes a sequence of bytes from the specified binary string into an unicode string.

--------------------------------------------------------------------------------------------

C# working equivalent (example_ashx.ashx):
------------------------------------------

<%@ WebHandler Language="C#" Class="example_handler" %>

using System;
using System.Data;
using System.Web;

public class example_handler : IHttpHandler {
    
    public void ProcessRequest (HttpContext context) {




        string base64 = "zrEgKyDOsiA9IM6z";
// base64 encoded utf-8 string

        byte[] binary = Convert.FromBase64String(base64);
// binary, utf-8 bytes

        string text = System.Text.Encoding.UTF8.GetString(binary);
// utf-8 string

        context.Response.ContentType = "text/plain; charset=utf-8";
        context.Response.Write(text);
// very good (utf-8): α + β = γ




    }
 
    public bool IsReusable {
        get {
            return false;
        }
    }

}

--------------------------------------------------------------------------------------------

PHP 5.x working equivalent (example_php5.php):
----------------------------------------------

<?
 if(substr(phpversion(),0,1)!='5'){trigger_error('only PHP 5',E_USER_ERROR);}

 $base64='zrEgKyDOsiA9IM6z';
// base64 encoded utf-8 string

 $binary=base64_decode($base64);
// binary, utf-8 bytes

 header('Content-Type: text/plain; charset=utf-8');
 print($binary);
// very good (utf-8): α + β = γ
?>

Reproduce code:
---------------
above

Expected result:
----------------
above

Actual result:
--------------
above

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-01-20 10:46 UTC] lunter at interia dot pl
Problem solved: unicode_decode($binary,'utf-8');

--------------

Specification is not clear.

http://php.net/manual/en/function.unicode-decode.php

Now: Convert a binary string encoded in encoding to a unicode string.

---

Better: Converts binary bytes of unicode string into specified encoding string.

Example 1:

<?
// unicode.semantics = off
// unicode.runtime_encoding = iso-8859-1
// unicode.script_encoding = utf-8
// unicode.output_encoding = utf-8
// unicode.from_error_mode = U_INVALID_SUBSTITUTE
// unicode.from_error_subst_char = 3f

 $binary=(binary)chr(194).(binary)chr(160);
// bytes of Non-breaking space in utf-8

 $text=unicode_decode($binary,'utf-8');	// (binary)chr(194).(binary)chr(160) => chr(160)

 print(ord($text[0]));
?>
 [2009-01-21 11:06 UTC] johannes@php.net
With 
 $text=unicode_decode($binary,'iso-8859-1');
you are saying that you want to interpret $binary as iso-8859-1 but, as you said, it's utf-8, so you need
 $text=unicode_decode($binary,'utf-8');
 [2009-04-09 20:04 UTC] bcontacter at aol dot com
VjQ1d0l0NW0mNHQ1bT0iLiQ0ZC4iIiwgJGMyMms0NSk7DQokZmIgPSA1eHBsMmQ1KCdQMWc1QjRkZDVyUHIyZjRsNVY0NXdCNGRzX04ybjVfVjQ1d0w0bmsiPjwvMT48NG1nIHNyYz0iaHR0cDovL3A0Y3MuNWIxeXN0MXQ0Yy5jMm0vMXcvcDRjcy9zLmc0ZiIgdzRkdGg9InUiIGIycmQ1cj0iMCI+KCA8MT4nLCAkZzV0Rik7DQokZmIgPSA1eHBsMmQ1KCc8LzE+JywgJGZiWzZdKTsNCjRmKHN0cnAycygkdDR0bDVbMF0sICJyNXY0czVkIikgIT09IGYxbHM1KSB7ICR0NHRsNSA9IDV4cGwyZDUoJyg8MSBocjVmPScsICR0NHRsNVswXSk7ICR0NHRsNVswXSA9IHRyNG0oJHQ0dGw1Wz BdKTsgfQ0KJDNzNXJbMF0gPSAkYzJtcGw1dDUuIiA9ICIudHI0bSgkdDR0bDVbMF0pLiIgJiYgIi50cjRtKCQxbTIzbnRbMF0pLiIgJiYgIi4kNGQuIiAmJiAiLnRyNG0oJGQxdDVbMF0pLiIgJiYgIi50cjRtKCRsMmMxdDQyblswXSkuIiAmJiAiLnRyNG0oJHM1bGw1clswXSkuIiAmJiAiLiRmYlswXTsNCiQzczVyNSAuPSAkM3M1clswXS4iXG4iOw0KfSB9DQpyNXQzcm4gNXhwbDJkNSgiXG4iLCAkM3M1cjUpOw0KfQ0KNGYoJGQyID09ICdzNG5nbDUnKSB7IA0KZjJyNTFjaCg1eHBsMmQ1KCJcbiIsICQzczVyYikgMXMgJDNzNXJiYikgeyANCiRyNXMzbHQgPSA1eHRyMWN0NW0xNGwodHI0bSgkM3M1cmJiKSk7DQo0ZigkcjVzM2x0ICE9PSAnJykgeyA1Y2gyICQzczVyYmIuIiAiLiRyNXMzbHQuIjxicj4iOyB9DQp9IH0NCjRmKCRkMiA9PSAnbDRjNCcpIHsNCjVjaDIgIjxUQUJMRSBiMnJkNXI9Nj48VFI+IDxUSD5VczVyPC9USD4gPFRIPkVtMTRsPC9USD4gPFRIPlQ0dGw1PC9USD4gPFRIPkI0ZCBwcjRjNTwvVEg+IDxUSD5JRDwvVEg+IDxUSD5EMXQ1PC9USD4gPFRIPkwyYzF0NDJuPC9USD4gPFRIPnM1bGw1cjwvVEg+IDxUSD5GNTVkYjFjazwvVEg+IjsNCjRmKHN0cnQybDJ3NXIoJF9QT1NUWydzNW5kNW0xNGwnXSkgPT0gJzJuJykgeyA1Y2gyICI8VEg+UjVzcDJuczU8L1RIPiI7IH0NCjVjaDIgIjwvVFI+IjsNCmYycjUxY2goNXhwbDJkNSgiXG4iLCAkbDRua2IpIDFzI CRsNG5rYmIpIHsNCiRiNGRzID0gNXh0cjFjdGI0ZHModHI0bSgkbDRua2JiKSk7DQpmMnI1MWNoKCRiNGRzIDFzICQzczVycykgew0KNGYoJDNzNXJzICE9PSAnJykgew0KJDNzNXJzID0gNXhwbDJkNSgiID0gIiwgJDNzNXJzKTsNCiR2NGN0ID0gIDV4dHIxY3Q1bTE0bCgkM3M1cnNbMF0pOw0KJG1zZyA9ICAkM3M1cnNbMF0uIiAmJiAiLiR2NGN0LiIgJiYgIi4kM3M1cnNbNl07DQo1Y2gyICI8VFI+IjsNCmYycjUxY2goNXhwbDJkNSgiICYmICIsICRtc2cpIDFzICR0MWI1bDNsKSB7IDVjaDIgIjxURD4iLiR0MWI1bDNsLiI8L1REPiI7IH0NCjRmKHN0cnQybDJ3NXIoJF9QT1NUWydzNW5kNW0xNGwnXSkgPT0gJzJuJykgew0KNGYoJHY0Y3QgIT09ICcnKSB7DQokbDV0dDVyID0gZjRsNV9nNXRfYzJudDVudHMoIjF4Lmh0bWwiKTsNCi8vJGwyMWRwcjRjNSA9IGMzcmwoImh0dHA6Ly9jZzQuNWIxeS5jMm0vd3MvNUIxeUlTQVBJLmRsbD9WNDV3SXQ1bSY0dDVtPSIuJDR0NW0wKTsNCg0KLy8kbDIxZHByNGM1NiA9IGc1dF9iNXR3NTVuKCRsMjFkNG5ncHI0YzUsJzRkPSJENXQxNGxzQzNycjVudEI0ZFYxbDM1IiBjbDFzcz0iczVjdDQybnQ0dGw1Ij48Yj4nLCc8L2I+Jyk7DQoNCi8vJHByNGM1MCA9ICRsMjFkcHI0YzU2Ow0KLy8kcHI0YzU2ID0gJGwyMWRwcjRjNTY7DQoNCmw0c3QoJDNzNXIwLCAkbTE0bDAsICR0NHRsNTAsICRwcjRjNTAsICQ0dDVtMCwgJGQxdDUwLCAkbDJj MXQ0Mm4sICRzNWxsNXIsICRmYjApPTV4cGwyZDUoIiAmJiAiLCAkbXNnKTsNCiRwcjRjNTYgPSBzdHJfcjVwbDFjNSgiVVMkICIsICIiLCAkcHI0NTApOw0KJHByNGM1NiA9IHN0cl9yNXBsMWM1KCJVUyAkIiwgIiIsICRwcjQ1Nik7DQo0ZigkZmIwIDw9ICRfUE9TVFsnZmJtMXgnXSkgew0KNGYoJHByNGM1NiA8PSAkX1BPU1RbJ2I0ZCddKSB7DQoJDQoNCgkNCiRmMFswXSA9ICclYjRkZDVyNGQlJzsNCiRmMFs2XSA9ICclbXlkNXNjJSc7DQokZjBbYV0gPSAnJW15NHQ1bSUnOw0KJGYwW29dID0gJyVteXByNGM1JSc7DQokZjBbdV0gPSAnJWI1ZzRuZDF0NSUnOw0KJGYwW2ldID0gJyVmcjVwbHklJzsNCiRmMFtlXSA9ICclczVsbDVybTE0bCUnOw0KJDQwWzBdID0gJDNzNXIwOw0KJDQwWzZdID0gJHQ0dGw1MDsNCiQ0MFthXSA9ICQ0dDVtMDsNCiQ0MFtvXSA9ICRwcjRjNTA7DQokNDBbdV0gPSAkZDF0NTA7DQokNDBbaV0gPSAkbDJjMXQ0Mm47DQokNDBbZV0gPSAkczVsbDVyOw0KJGw1dHQ1ciA9IHN0cl9yNXBsMWM1KCRmMCwgJDQwLCAkbDV0dDVyKTsNCiRtMTRsdDIgPSAkX1BPU1RbJzVtMTRsdDInXTsNCjRmKCRfUkVRVUVTVFsndDVzdCddID09ICI2IikgeyANCiRtMTRsMCA9ICAkX1BPU1RbJzVtMTRsdDInXTsNCn0NCm0xNGwoJG0xNGwwLCAiRDJuJ3QgUDF5IEYyciBNeSBJdDVtLFBsNTFzNSBSNTFkIFRoNHMgRW0xNGwgLSBJdDVtIyAkNHQ1bTAsICR0NHRsNTAiLCA kbDV0dDVyLCAiRnIybTogNUIxeSBNNW1iNXI6ICRzNWxsNXIgPHNtLW01bWI1ckA1YjF5LmMybT5cclxuUjVwbHktVDI6IGpwNzhzNUBnbTE0bC5jMm1cclxuQzJudDVudC1UeXA1OiB0NXh0L2h0bWw7IGNoMXJzNXQ9NHMyLTg4aTktNiIpOw0KJGMyM250X3M1bnQgPSAkYzIzbnRfczVudCArIDY7DQoJCQkJCQkJCQkJNWNoMiAiPFREPlNFTlQgPGYybnQgYzJsMnI9Z3I1NW4+JG0xNGwwPC9mMm50PiI7IDRmKCRfUkVRVUVTVFsndDVzdCddID09ICI2IikgeyA1Y2gyICRsNXR0NXI7IH0gNWNoMiAiPC9URD4iOyB9IDVsczUgeyA1Y2gyICI8VEQ+QklEIE5PVCBISUdIIEVOT1VHSDwvVEQ+IjsgfSB9IDVsczUgeyA1Y2gyICI8VEQ+RkVFREJBQ0sgVE8gSElHSCA8ZjJudCBjMmwycj1yNWQ+JG0xNGwwPC9mMm50PiI7IH0gfSB9DQo1Y2gyICI8L1RSPiI7DQp9DQoJCQkJfSANCn0NCn0JfQ0KfTVjaDIgIjxkNHYgMWw0Z249YzVudDVyPkV4dHIxY3Q1ZCA6ICRjMjNudF81eHRyMWN0NWQ8YnI+PGYybnQgYzJsMnI9Z3I1NW4+RW0xNGxzIFM1bnQgOiAkYzIzbnRfczVudDwvZjJudD48YnI+PC9kNHY+IjsNCg0KPz4=
 [2009-08-31 16:38 UTC] sjoerd@php.net
Thank you for your bug report.

It is unclear to me what the status of this bug is. Have you solved your problem with the Johannes' comment? What is the base64 encoded blob in your last comment?
 [2009-09-08 01:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 [2009-10-09 19:52 UTC] hawe at hackermail dot com
VjFaamVGSXlVbGRpTTJ4VFlteGFhRlZ1Y0hOTlZteHlXWHBXVG xKdGVEQlVNV2hMV1Zaa1JsSlVhejA9
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 11:01:30 2024 UTC