php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #47156 PHP 6.0 can not calculates (encodes) correct base64 from unicode string
Submitted: 2009-01-19 22:49 UTC Modified: 2009-01-24 12:29 UTC
From: lunter at interia dot pl Assigned:
Status: Closed Package: Unicode Engine related
PHP Version: 6CVS-2009-01-19 (snap) OS: all
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: lunter at interia dot pl
New email:
PHP Version: OS:

 

 [2009-01-19 22:49 UTC] lunter at interia dot pl
Description:
------------
Problem:
--------

Source files: http://pc44.one.pl/goorol/bugs/base64.zip

PHP 6.0 can not calculates (encodes) correct base64 from unicode string.

If you think it is bogus, show the way to encode (utf-8)string 'α + β = γ' into 'zrEgKyDOsiA9IM6z' using PHP 6.
Other programming languages and applications using utf-8 have not a problem.

--------------------------------------------------------------------------------------------

PHP 6.0 example (example_php6.php):
-----------------------------------

<?
 if(substr(phpversion(),0,1)!='6'){trigger_error('only PHP 6',E_USER_ERROR);}

// unicode.semantics = off
// unicode.runtime_encoding = iso-8859-1
// unicode.script_encoding = utf-8
// unicode.output_encoding = utf-8
// unicode.from_error_mode = U_INVALID_SUBSTITUTE
// unicode.from_error_subst_char = 3f


 $text='α + β = γ';
// utf-8 string

 $binary=unicode_encode($text,'iso-8859-1');
// binary, utf-8 bytes
// why iso-8859-x is only supported, where is raw binary option?

// or new function (unicode)string->(binary)unicode_bytes_string
// $binary=uni2bin($text);

 $base64=base64_encode($binary);
// base64 encoded utf-8 string

 header('Content-Type: text/plain');
 print($base64);

 print("\n\n");
 print('SHOULD BE: zrEgKyDOsiA9IM6z');
?>

--------------------------------------------------------------------------------------------

Solution:
---------

We can not get (binary)string consists utf-8 bytes from (unicode)string.
Imagine: converting (unicode)string<->(binary)unicode_bytes_string newer need charset infomation.

C#: System.Text.Encoding.UTF8.GetBytes()
Encodes a set of characters into a sequence of bytes.

PHP equivalent needed: binary uni2bin( unicode $u )
Encodes a set of characters into a sequence of binary bytes string.

--------------------------------------------------------------------------------------------

C# working equivalent (example_ashx.ashx):
------------------------------------------

<%@ WebHandler Language="C#" Class="example_handler" %>

using System;
using System.Data;
using System.Web;

public class example_handler : IHttpHandler {
    
    public void ProcessRequest (HttpContext context) {




        string text = "α + β = γ";
// utf-8 string

        byte[] binary = System.Text.Encoding.UTF8.GetBytes(text);
// binary, utf-8 bytes

        string base64 = Convert.ToBase64String(binary);
// base64 encoded utf-8 string

        context.Response.ContentType = "text/plain";
        context.Response.Write(base64);
// very good: zrEgKyDOsiA9IM6z




    }
 
    public bool IsReusable {
        get {
            return false;
        }
    }

}

--------------------------------------------------------------------------------------------

PHP 5.x working equivalent (example_php5.php):
----------------------------------------------

<?
 if(substr(phpversion(),0,1)!='5'){trigger_error('only PHP 5',E_USER_ERROR);}

 $binary='α + β = γ';
// binary, utf-8 bytes

 $base64=base64_encode($binary);
// base64 encoded utf-8 string

 header('Content-Type: text/plain');
 print($base64);
// very good: zrEgKyDOsiA9IM6z
?>

Reproduce code:
---------------
above

Expected result:
----------------
above

Actual result:
--------------
above

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-01-20 10:36 UTC] lunter at interia dot pl
Problem solved: unicode_encode($text,'utf-8');

--------------

Specification is not clear.

http://php.net/manual/en/function.unicode-encode.php

Now: Takes a unicode string and converts it to a string in the specified encoding.

---

Better: Converts unicode string into binary bytes string in specified encoding.

Example 1:

<?
// unicode.semantics = off
// unicode.runtime_encoding = iso-8859-1
// unicode.script_encoding = utf-8
// unicode.output_encoding = utf-8
// unicode.from_error_mode = U_INVALID_SUBSTITUTE
// unicode.from_error_subst_char = 3f

 $text=chr(160);
// Non-breaking space

 $binary=unicode_encode($text,'utf-8');	// chr(160) => (binary)chr(240).(binary)chr(160)

 foreach(str_split($binary) as $c){
  print(ord($c));
  print('<br>');
 }
?>
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Sep 21 03:01:27 2024 UTC