php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #47033 converting binary<->string without charset translating
Submitted: 2009-01-08 07:42 UTC Modified: 2009-01-17 18:01 UTC
Votes:1
Avg. Score:3.0 ± 0.0
Reproduced:0 of 0 (0.0%)
From: lunter at interia dot pl Assigned:
Status: Not a bug Package: Unicode Engine related
PHP Version: 6CVS-2009-01-08 (snap) OS: all
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please !
Your email address:
MUST BE VALID
Solve the problem:
37 - 15 = ?
Subscribe to this entry?

 
 [2009-01-08 07:42 UTC] lunter at interia dot pl
Description:
------------
converting binary<->string without charset translating


Two functions for converting binary<->string without translating charset needed.
It is very usefull because binary data can consists utf-8 substring, you need to convert to string type.
Also when you have to see binary representation of utf-8 string, or operating on it as binaries.


Example 1:

You have (binary)$b. It consists two bytes: 11001110 10110010
Its length in binary representation is two.
It is also valid one-length UTF-8 char(946) (greek small letter beta)
How to conver it ($b) into one-char UTF-8 string??
When we try $u=(string)$b, it gives two-char UTF-8 string.


Example 2:

You have (string)$u UTF-8 one-char string. It consists chr(946) (greek small letter beta)
Now You have to see two bytes binary representation of this (11001110 10110010).
There is no way to convert it without charset translation...




Reproduce code:
---------------
;;;;;;;;;;;;;;;;;;;;
; Unicode settings ;
;;;;;;;;;;;;;;;;;;;;

unicode.semantics = off
unicode.runtime_encoding = iso-8859-1
unicode.script_encoding = utf-8
unicode.output_encoding = utf-8
unicode.from_error_mode = U_INVALID_SUBSTITUTE
unicode.from_error_subst_char = 3f




Expected result:
----------------
way to converting binary<->string without charset translating



Actual result:
--------------
no way to converting binary<->string without charset translating




Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-01-11 22:13 UTC] johannes@php.net
We have unicode_encode() / unicode_decode().
 [2009-01-12 12:08 UTC] lunter at interia dot pl
Johannes, false.

Try this:

<?
 $s=chr(946);
 print(strlen($s));

 print('<br>');

 $b=unicode_encode($s,'iso-8859-1');

 print(strlen($b));
?>

unicode chr(946) in binary encoding has len = 2 bytes, not 1
 [2009-01-17 18:01 UTC] johannes@php.net
unicode_encode(chr(946),'utf-8') works for me - iso-8859-1 has only bytesfrom 0-255 ...
 
PHP Copyright © 2001-2021 The PHP Group
All rights reserved.
Last updated: Tue Oct 19 16:03:35 2021 UTC