php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #47156 PHP 6.0 can not calculates (encodes) correct base64 from unicode string
Submitted: 2009-01-19 22:49 UTC Modified: 2009-01-24 12:29 UTC
From: lunter at interia dot pl Assigned:
Status: Closed Package: Unicode Engine related
PHP Version: 6CVS-2009-01-19 (snap) OS: all
Private report: No CVE-ID: None
 [2009-01-19 22:49 UTC] lunter at interia dot pl
Description:
------------
Problem:
--------

Source files: http://pc44.one.pl/goorol/bugs/base64.zip

PHP 6.0 can not calculates (encodes) correct base64 from unicode string.

If you think it is bogus, show the way to encode (utf-8)string 'α + β = γ' into 'zrEgKyDOsiA9IM6z' using PHP 6.
Other programming languages and applications using utf-8 have not a problem.

--------------------------------------------------------------------------------------------

PHP 6.0 example (example_php6.php):
-----------------------------------

<?
 if(substr(phpversion(),0,1)!='6'){trigger_error('only PHP 6',E_USER_ERROR);}

// unicode.semantics = off
// unicode.runtime_encoding = iso-8859-1
// unicode.script_encoding = utf-8
// unicode.output_encoding = utf-8
// unicode.from_error_mode = U_INVALID_SUBSTITUTE
// unicode.from_error_subst_char = 3f


 $text='α + β = γ';
// utf-8 string

 $binary=unicode_encode($text,'iso-8859-1');
// binary, utf-8 bytes
// why iso-8859-x is only supported, where is raw binary option?

// or new function (unicode)string->(binary)unicode_bytes_string
// $binary=uni2bin($text);

 $base64=base64_encode($binary);
// base64 encoded utf-8 string

 header('Content-Type: text/plain');
 print($base64);

 print("\n\n");
 print('SHOULD BE: zrEgKyDOsiA9IM6z');
?>

--------------------------------------------------------------------------------------------

Solution:
---------

We can not get (binary)string consists utf-8 bytes from (unicode)string.
Imagine: converting (unicode)string<->(binary)unicode_bytes_string newer need charset infomation.

C#: System.Text.Encoding.UTF8.GetBytes()
Encodes a set of characters into a sequence of bytes.

PHP equivalent needed: binary uni2bin( unicode $u )
Encodes a set of characters into a sequence of binary bytes string.

--------------------------------------------------------------------------------------------

C# working equivalent (example_ashx.ashx):
------------------------------------------

<%@ WebHandler Language="C#" Class="example_handler" %>

using System;
using System.Data;
using System.Web;

public class example_handler : IHttpHandler {
    
    public void ProcessRequest (HttpContext context) {




        string text = "α + β = γ";
// utf-8 string

        byte[] binary = System.Text.Encoding.UTF8.GetBytes(text);
// binary, utf-8 bytes

        string base64 = Convert.ToBase64String(binary);
// base64 encoded utf-8 string

        context.Response.ContentType = "text/plain";
        context.Response.Write(base64);
// very good: zrEgKyDOsiA9IM6z




    }
 
    public bool IsReusable {
        get {
            return false;
        }
    }

}

--------------------------------------------------------------------------------------------

PHP 5.x working equivalent (example_php5.php):
----------------------------------------------

<?
 if(substr(phpversion(),0,1)!='5'){trigger_error('only PHP 5',E_USER_ERROR);}

 $binary='α + β = γ';
// binary, utf-8 bytes

 $base64=base64_encode($binary);
// base64 encoded utf-8 string

 header('Content-Type: text/plain');
 print($base64);
// very good: zrEgKyDOsiA9IM6z
?>

Reproduce code:
---------------
above

Expected result:
----------------
above

Actual result:
--------------
above

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-01-20 10:36 UTC] lunter at interia dot pl
Problem solved: unicode_encode($text,'utf-8');

--------------

Specification is not clear.

http://php.net/manual/en/function.unicode-encode.php

Now: Takes a unicode string and converts it to a string in the specified encoding.

---

Better: Converts unicode string into binary bytes string in specified encoding.

Example 1:

<?
// unicode.semantics = off
// unicode.runtime_encoding = iso-8859-1
// unicode.script_encoding = utf-8
// unicode.output_encoding = utf-8
// unicode.from_error_mode = U_INVALID_SUBSTITUTE
// unicode.from_error_subst_char = 3f

 $text=chr(160);
// Non-breaking space

 $binary=unicode_encode($text,'utf-8');	// chr(160) => (binary)chr(240).(binary)chr(160)

 foreach(str_split($binary) as $c){
  print(ord($c));
  print('<br>');
 }
?>
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Mon Dec 30 16:01:29 2024 UTC