php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #47158 PHP 6.0 can not calculates correct sha1 from unicode string
Submitted: 2009-01-19 23:32 UTC Modified: 2009-01-24 12:28 UTC
From: lunter at interia dot pl Assigned:
Status: Closed Package: Unicode Engine related
PHP Version: 6CVS-2009-01-19 (snap) OS: all
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: lunter at interia dot pl
New email:
PHP Version: OS:

 

 [2009-01-19 23:32 UTC] lunter at interia dot pl
Description:
------------
Problem:
--------

Source files: http://pc44.one.pl/goorol/bugs/sha1.zip

PHP 6.0 can not calculates correct sha1 from unicode string.

If you think it is bogus, show the way to calculate 7fd9f992cabad8e65d77285911e1dbefbf77d07d sha1 of (utf-8)string 'α + β = γ' using PHP 6.
Other programming languages and applications using utf-8 have not a problem.

--------------------------------------------------------------------------------------------

PHP 6.0 example (example_php6.php):
-----------------------------------

<?
 if(substr(phpversion(),0,1)!='6'){trigger_error('only PHP 6',E_USER_ERROR);}

// unicode.semantics = off
// unicode.runtime_encoding = iso-8859-1
// unicode.script_encoding = utf-8
// unicode.output_encoding = utf-8
// unicode.from_error_mode = U_INVALID_SUBSTITUTE
// unicode.from_error_subst_char = 3f


 $text='α + β = γ';
// utf-8 string

 $binary=unicode_encode($text,'iso-8859-1');
// binary, utf-8 bytes
// why iso-8859-x is only supported, where is raw binary option?

// or new function (unicode)string->(binary)unicode_bytes_string
// $binary=uni2bin($text);

 $sha1=sha1($binary);
// sha1 of utf-8 string

 header('Content-Type: text/plain');
 print($sha1);

 print("\n\n");
 print('SHOULD BE: 7fd9f992cabad8e65d77285911e1dbefbf77d07d');
?>

--------------------------------------------------------------------------------------------

Solution:
---------

We can not get (binary)string consists utf-8 bytes from (unicode)string.
Imagine: converting (unicode)string<->(binary)unicode_bytes_string newer need charset infomation.

C#: System.Text.Encoding.UTF8.GetBytes()
Encodes a set of characters into a sequence of bytes.

PHP equivalent needed: binary uni2bin( unicode $u )
Encodes a set of characters into a sequence of binary bytes string.

--------------------------------------------------------------------------------------------

C# working equivalent (example_ashx.ashx):
------------------------------------------

<%@ WebHandler Language="C#" Class="example_handler" %>

using System;
using System.Data;
using System.Web;
using System.Security.Cryptography;

public class example_handler : IHttpHandler {

    private string calculate_sha1(byte[] message){
        SHA1 sha1 = SHA1CryptoServiceProvider.Create();
        byte[] hash = sha1.ComputeHash(message);
        System.Text.StringBuilder digest = new System.Text.StringBuilder();
        foreach (byte n in hash) digest.Append(n.ToString("x2"));
        return digest.ToString();
    }
    
    public void ProcessRequest (HttpContext context) {




        string text = "α + β = γ";
// utf-8 string

        byte[] binary = System.Text.Encoding.UTF8.GetBytes(text);
// binary, utf-8 bytes

        string sha1 = calculate_sha1(binary);
// sha1 of utf-8 string

        context.Response.ContentType = "text/plain";
        context.Response.Write(sha1);
// very good: 7fd9f992cabad8e65d77285911e1dbefbf77d07d




    }
 
    public bool IsReusable {
        get {
            return false;
        }
    }

}

--------------------------------------------------------------------------------------------

PHP 5.x working equivalent (example_php5.php):
----------------------------------------------

<?
 if(substr(phpversion(),0,1)!='5'){trigger_error('only PHP 5',E_USER_ERROR);}

 $binary='α + β = γ';
// binary, utf-8 bytes

 $sha1=sha1($binary);
// sha1 of utf-8 string

 header('Content-Type: text/plain');
 print($sha1);
// very good: 7fd9f992cabad8e65d77285911e1dbefbf77d07d
?>

Reproduce code:
---------------
above

Expected result:
----------------
above

Actual result:
--------------
above

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-01-19 23:52 UTC] lunter at interia dot pl
Other working equivalent:

Online sha1 calculator using java.security.MessageDigest.
Page uses utf-8 char-set.

http://www.fileformat.info/tool/hash.htm?text=%CE%B1+%2B+%CE%B2+%3D+%CE%B3

type Target text: α + β = γ

sha1 result: 7fd9f992cabad8e65d77285911e1dbefbf77d07d
 [2009-01-19 23:55 UTC] lunter at interia dot pl
Can you see second line in table?

Original bytes ce:b1:20:2b:20:ce:b2:20:3d:20:ce:b3 (length=12)

binary, utf-8 bytes in hex ;)
 [2009-01-20 10:34 UTC] lunter at interia dot pl
Problem solved: unicode_encode($text,'utf-8');

--------------

Specification is not clear.

http://php.net/manual/en/function.unicode-encode.php

Now: Takes a unicode string and converts it to a string in the specified encoding.

---

Better: Converts unicode string into binary bytes string in specified encoding.

Example 1:

<?
// unicode.semantics = off
// unicode.runtime_encoding = iso-8859-1
// unicode.script_encoding = utf-8
// unicode.output_encoding = utf-8
// unicode.from_error_mode = U_INVALID_SUBSTITUTE
// unicode.from_error_subst_char = 3f

 $text=chr(160);
// Non-breaking space

 $binary=unicode_encode($text,'utf-8');	// chr(160) => (binary)chr(240).(binary)chr(160)

 foreach(str_split($binary) as $c){
  print(ord($c));
  print('<br>');
 }
?>
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Jan 08 14:01:28 2025 UTC