php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #47158 PHP 6.0 can not calculates correct sha1 from unicode string
Submitted: 2009-01-19 23:32 UTC Modified: 2009-01-24 12:28 UTC
From: lunter at interia dot pl Assigned:
Status: Closed Package: Unicode Engine related
PHP Version: 6CVS-2009-01-19 (snap) OS: all
Private report: No CVE-ID: None
 [2009-01-19 23:32 UTC] lunter at interia dot pl
Description:
------------
Problem:
--------

Source files: http://pc44.one.pl/goorol/bugs/sha1.zip

PHP 6.0 can not calculates correct sha1 from unicode string.

If you think it is bogus, show the way to calculate 7fd9f992cabad8e65d77285911e1dbefbf77d07d sha1 of (utf-8)string 'α + β = γ' using PHP 6.
Other programming languages and applications using utf-8 have not a problem.

--------------------------------------------------------------------------------------------

PHP 6.0 example (example_php6.php):
-----------------------------------

<?
 if(substr(phpversion(),0,1)!='6'){trigger_error('only PHP 6',E_USER_ERROR);}

// unicode.semantics = off
// unicode.runtime_encoding = iso-8859-1
// unicode.script_encoding = utf-8
// unicode.output_encoding = utf-8
// unicode.from_error_mode = U_INVALID_SUBSTITUTE
// unicode.from_error_subst_char = 3f


 $text='α + β = γ';
// utf-8 string

 $binary=unicode_encode($text,'iso-8859-1');
// binary, utf-8 bytes
// why iso-8859-x is only supported, where is raw binary option?

// or new function (unicode)string->(binary)unicode_bytes_string
// $binary=uni2bin($text);

 $sha1=sha1($binary);
// sha1 of utf-8 string

 header('Content-Type: text/plain');
 print($sha1);

 print("\n\n");
 print('SHOULD BE: 7fd9f992cabad8e65d77285911e1dbefbf77d07d');
?>

--------------------------------------------------------------------------------------------

Solution:
---------

We can not get (binary)string consists utf-8 bytes from (unicode)string.
Imagine: converting (unicode)string<->(binary)unicode_bytes_string newer need charset infomation.

C#: System.Text.Encoding.UTF8.GetBytes()
Encodes a set of characters into a sequence of bytes.

PHP equivalent needed: binary uni2bin( unicode $u )
Encodes a set of characters into a sequence of binary bytes string.

--------------------------------------------------------------------------------------------

C# working equivalent (example_ashx.ashx):
------------------------------------------

<%@ WebHandler Language="C#" Class="example_handler" %>

using System;
using System.Data;
using System.Web;
using System.Security.Cryptography;

public class example_handler : IHttpHandler {

    private string calculate_sha1(byte[] message){
        SHA1 sha1 = SHA1CryptoServiceProvider.Create();
        byte[] hash = sha1.ComputeHash(message);
        System.Text.StringBuilder digest = new System.Text.StringBuilder();
        foreach (byte n in hash) digest.Append(n.ToString("x2"));
        return digest.ToString();
    }
    
    public void ProcessRequest (HttpContext context) {




        string text = "α + β = γ";
// utf-8 string

        byte[] binary = System.Text.Encoding.UTF8.GetBytes(text);
// binary, utf-8 bytes

        string sha1 = calculate_sha1(binary);
// sha1 of utf-8 string

        context.Response.ContentType = "text/plain";
        context.Response.Write(sha1);
// very good: 7fd9f992cabad8e65d77285911e1dbefbf77d07d




    }
 
    public bool IsReusable {
        get {
            return false;
        }
    }

}

--------------------------------------------------------------------------------------------

PHP 5.x working equivalent (example_php5.php):
----------------------------------------------

<?
 if(substr(phpversion(),0,1)!='5'){trigger_error('only PHP 5',E_USER_ERROR);}

 $binary='α + β = γ';
// binary, utf-8 bytes

 $sha1=sha1($binary);
// sha1 of utf-8 string

 header('Content-Type: text/plain');
 print($sha1);
// very good: 7fd9f992cabad8e65d77285911e1dbefbf77d07d
?>

Reproduce code:
---------------
above

Expected result:
----------------
above

Actual result:
--------------
above

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-01-19 23:52 UTC] lunter at interia dot pl
Other working equivalent:

Online sha1 calculator using java.security.MessageDigest.
Page uses utf-8 char-set.

http://www.fileformat.info/tool/hash.htm?text=%CE%B1+%2B+%CE%B2+%3D+%CE%B3

type Target text: α + β = γ

sha1 result: 7fd9f992cabad8e65d77285911e1dbefbf77d07d
 [2009-01-19 23:55 UTC] lunter at interia dot pl
Can you see second line in table?

Original bytes ce:b1:20:2b:20:ce:b2:20:3d:20:ce:b3 (length=12)

binary, utf-8 bytes in hex ;)
 [2009-01-20 10:34 UTC] lunter at interia dot pl
Problem solved: unicode_encode($text,'utf-8');

--------------

Specification is not clear.

http://php.net/manual/en/function.unicode-encode.php

Now: Takes a unicode string and converts it to a string in the specified encoding.

---

Better: Converts unicode string into binary bytes string in specified encoding.

Example 1:

<?
// unicode.semantics = off
// unicode.runtime_encoding = iso-8859-1
// unicode.script_encoding = utf-8
// unicode.output_encoding = utf-8
// unicode.from_error_mode = U_INVALID_SUBSTITUTE
// unicode.from_error_subst_char = 3f

 $text=chr(160);
// Non-breaking space

 $binary=unicode_encode($text,'utf-8');	// chr(160) => (binary)chr(240).(binary)chr(160)

 foreach(str_split($binary) as $c){
  print(ord($c));
  print('<br>');
 }
?>
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 14:01:32 2024 UTC