php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #70591 ord() no longer yields correct results consistently
Submitted: 2015-09-27 00:10 UTC Modified: 2015-09-28 01:02 UTC
From: slevy1 at pipeline dot com Assigned:
Status: Not a bug Package: *General Issues
PHP Version: 7.0.0RC3 OS: n/a
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: slevy1 at pipeline dot com
New email:
PHP Version: OS:

 

 [2015-09-27 00:10 UTC] slevy1 at pipeline dot com
Description:
------------
I used ord() in combination with mb_strcut() to obtain the byte values of a multibyte char utf-8 encoded.  This used to work in PHP 5.4.8 - 5.5.29.  But in PHP5.6-PHP7.00rc3 it doesn't work any more :(

Test script:
---------------
<?php 


$str = "♥"; // 3 byte values per above

var_dump( ord(mb_strcut($str, 0, 1) )); 
var_dump( ord(mb_strcut($str, 1, 1) )); 
var_dump( ord(mb_strcut($str, 2, 1) ));

see live code: https://3v4l.org/OoFus

Expected result:
----------------
I expected the var dumps to produce decimal results as follows:

 226 
 153
 165

Actual result:
--------------
0
0
0

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2015-09-27 00:36 UTC] requinix@php.net
-Status: Open +Status: Not a bug
 [2015-09-27 00:36 UTC] requinix@php.net
The default character encoding, which your calls to mb_strcut() will be using, changed to be UTF-8 in PHP 5.6.

https://3v4l.org/YE6I9
 [2015-09-27 19:09 UTC] slevy1 at pipeline dot com
According to http://www.utf8-chartable.de/unicode-utf8-table.pl?start=9728, when the black heart symbol (U+2665) is utf-8 encoded, the result is a multibyte string consisting of 3 bytes whose hex values are e2 99 a5 which in decimal are  226  153
 165.  So, maybe the manual ought to document for mb_strcut() that one needs to specify the parameter "iso-8859-1", if one wishes to provide that string to another function like ord() since mb_strcut() originally was designed for that encoding. Or, should mb_strcut() be reworked?
 [2015-09-28 01:02 UTC] nikic@php.net
Please reread the documentation for mb_strcut() at http://php.net/mb_strcut, as I feel you are expecting it to do something very different from what it actually does (i.e. cutting at *character* boundaries based on *byte* offsets -- not cutting at *byte* boundaries).

The function you are probably looking for is substr() or even simpler the $str[$i] operation.
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Fri Jul 04 17:01:35 2025 UTC