php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #70591 ord() no longer yields correct results consistently
Submitted: 2015-09-27 00:10 UTC Modified: 2015-09-28 01:02 UTC
From: slevy1 at pipeline dot com Assigned:
Status: Not a bug Package: *General Issues
PHP Version: 7.0.0RC3 OS: n/a
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: slevy1 at pipeline dot com
New email:
PHP Version: OS:

 

 [2015-09-27 00:10 UTC] slevy1 at pipeline dot com
Description:
------------
I used ord() in combination with mb_strcut() to obtain the byte values of a multibyte char utf-8 encoded.  This used to work in PHP 5.4.8 - 5.5.29.  But in PHP5.6-PHP7.00rc3 it doesn't work any more :(

Test script:
---------------
<?php 


$str = "♥"; // 3 byte values per above

var_dump( ord(mb_strcut($str, 0, 1) )); 
var_dump( ord(mb_strcut($str, 1, 1) )); 
var_dump( ord(mb_strcut($str, 2, 1) ));

see live code: https://3v4l.org/OoFus

Expected result:
----------------
I expected the var dumps to produce decimal results as follows:

 226 
 153
 165

Actual result:
--------------
0
0
0

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2015-09-27 00:36 UTC] requinix@php.net
-Status: Open +Status: Not a bug
 [2015-09-27 00:36 UTC] requinix@php.net
The default character encoding, which your calls to mb_strcut() will be using, changed to be UTF-8 in PHP 5.6.

https://3v4l.org/YE6I9
 [2015-09-27 19:09 UTC] slevy1 at pipeline dot com
According to http://www.utf8-chartable.de/unicode-utf8-table.pl?start=9728, when the black heart symbol (U+2665) is utf-8 encoded, the result is a multibyte string consisting of 3 bytes whose hex values are e2 99 a5 which in decimal are  226  153
 165.  So, maybe the manual ought to document for mb_strcut() that one needs to specify the parameter "iso-8859-1", if one wishes to provide that string to another function like ord() since mb_strcut() originally was designed for that encoding. Or, should mb_strcut() be reworked?
 [2015-09-28 01:02 UTC] nikic@php.net
Please reread the documentation for mb_strcut() at http://php.net/mb_strcut, as I feel you are expecting it to do something very different from what it actually does (i.e. cutting at *character* boundaries based on *byte* offsets -- not cutting at *byte* boundaries).

The function you are probably looking for is substr() or even simpler the $str[$i] operation.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun May 05 08:01:30 2024 UTC