Bug #81598 Cannot input unicode characters in PHP 8 interactive shell
Submitted: 2021-11-08 12:15 UTC Modified: 2021-11-09 10:39 UTC
From: 7snovic at gmail dot com Assigned:
Status: Closed Package: CGI/CLI related
PHP Version: 8.0Git-2021-11-08 (snap) OS: Ubuntu 20
Private report: No CVE-ID: None
From: 7snovic at gmail dot com
 [2021-11-08 12:15 UTC] 7snovic at gmail dot com
When trying to use PHP in the interactive mode in PHP 8 you can not type any non-ASCII character like Arabic, Chinese and other languages.

$ php8.0 -a
echo "اهلا";

will copy echo ""; only into your terminal.

$ php7.4 -a
echo "اهلا";

will copy "اهلا"; into your terminal.

another references:


 [2021-11-08 14:12 UTC]
 [2021-11-08 14:12 UTC]
I'm assuming that this is about the interactive *shell* (where you
have a command prompt).  If so, do you use the same readline
implementation (`echo READLINE_LIB;`) with both PHP versions?
 [2021-11-08 14:45 UTC] 7snovic at gmail dot com
 [2021-11-08 14:45 UTC] 7snovic at gmail dot com
Update the package
 [2021-11-08 14:49 UTC] 7snovic at gmail dot com
Hello cmb,

HYG the output for the READLINE_LIB for PHP7.4, PHP8.0,and PHP8.1.

hassan@hassan:~$ php7.4 -r 'echo READLINE_LIB . "\n";'
hassan@hassan:~$ php8.0 -r 'echo READLINE_LIB . "\n";'
hassan@hassan:~$ php8.1 -r 'echo READLINE_LIB . "\n";'

Please note that the but is in both PHP8.0 & PHP8.1
 [2021-11-08 16:06 UTC]
 [2021-11-08 16:06 UTC]
Thanks for the swift reply!  I have no idea what might have
changed in PHP 8, and I'm usually working on Windows, I'll pass on
this one.
 [2021-11-08 16:06 UTC]
 [2021-11-08 16:06 UTC]
I can reproduce this. And I can also reproduce on 7.4 when running with "LANG=C php7.4 -a". The relevant change here is probably that PHP 8 no longer inherits the ctype locale from environment.
 [2021-11-08 16:07 UTC]
 [2021-11-08 16:07 UTC]
(Undoing concurrent status changes)
 [2021-11-08 16:14 UTC]
> The relevant change here is probably that PHP 8 no longer
> inherits the ctype locale from environment.

Ah, so this would be rather a documentation issue, telling users
to call setlocale() at the start of their session?
 [2021-11-08 16:28 UTC]
From a quick test, calling setlocale() at the start of the session doesn't seem to work. It would be pretty bad UX as well.

Though that gives me a tiny bit of hope that maybe it's sufficient to have the locale set during some kind of startup procedure, and we can use the C locale afterwards.
 [2021-11-09 10:39 UTC]
I looked into the editline implementation a bit (apparently available at as a tar.gz file only -- no version control) and it seems to be based entirely around wchar functionality, which is indeed locale-based.

Something I tried and that seems to work is to use the "C.UTF-8" locale. I believe that should both make mbrtowc() work fine, and not introduce any locale-sensitivity in the single-byte functions used by PHP at the same time.
 [2021-11-09 11:07 UTC]
The following pull request has been associated:

Patch Name: Fix bug #81598: Use C.UTF-8 as LC_CTYPE locale by default
On GitHub:
 [2021-12-05 20:04 UTC]
Automatic comment on behalf of nikic
Log: Fix bug #81598: Use C.UTF-8 as LC_CTYPE locale by default
 [2021-12-05 20:04 UTC]
