|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #65080 ctype_*() don't properly support multibyte locales
Submitted: 2013-06-21 01:47 UTC Modified: 2021-11-08 12:31 UTC
From: masakielastic at gmail dot com Assigned:
Status: Verified Package: Strings related
PHP Version: 5.5.0 OS: Mac OSX
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please — but make sure to vote on the bug!
Your email address:
Solve the problem:
9 - 3 = ?
Subscribe to this entry?

 [2013-06-21 01:47 UTC] masakielastic at gmail dot com
ctype_lower detects non-lower characters when the local is set to 'en_US.UTF-8' 
on Mac OSX 10.8. This phenomenon cannot't be reproduced on Ubuntu Linux.

This phenomenon means ctype_lower detects Chinese characters and Hangul (Korean 
Alphabet) which have no concept about lower and upper cases.

The test cases for C language and showing misdetected characters can be seen 

The tests for BSD-compatible OSes are needed judging from Xcode's manual.

ctype_upper also detects non-upper characters.

Test script:
$expected = [];
$result = [];
for ($i = 0; $i <= 0xFF; ++$i) {
    setlocale(LC_ALL, 'C');
    if (ctype_lower(chr($i))) {
        $expected[] = $i;
    setlocale(LC_ALL, 'en_US.UTF-8');
    if (ctype_lower(chr($i))) {
        $result[] = $i;
    [] === array_diff($result, $expected)

Expected result:

Actual result:


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2015-05-08 23:11 UTC]
Related to bug #63663.
 [2021-11-08 12:31 UTC]
-Summary: ctype_lower detects non-lower characters +Summary: ctype_*() don't properly support multibyte locales -Status: Open +Status: Verified -Type: Bug +Type: Documentation Problem
 [2021-11-08 12:31 UTC]
The ctype_*() functions are fundamentally flawed for multibyte
locales when a string is passed, since PHP strings have no
notion of their encoding, and as such each byte is passed to the
underlying C function (lower(3) in this case), and that can't
properly work.  Passing the code points as int would work to some
extend, but that is deprecated as of PHP 8.1.0[1].

Contrary to the current documentation[2], I suggest to avoid these
functions in favor of MBString or PCRE's character properties.  In
any way, the current behavior needs to be better documented.

[1] <>
[2] <>
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue May 28 17:01:31 2024 UTC