php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #77914 Wrong thousands separator encoding in NumberFormatter
Submitted: 2019-04-17 13:33 UTC Modified: 2019-04-30 08:53 UTC
From: mvachette at adequasys dot com Assigned: cmb (profile)
Status: Not a bug Package: intl (PECL)
PHP Version: 7.2.17 OS: Windows 10
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: mvachette at adequasys dot com
New email:
PHP Version: OS:

 

 [2019-04-17 13:33 UTC] mvachette at adequasys dot com
Description:
------------
When using NumberFormatter on a PHP using ISO-8859-1 encoding,  formatting a currency will result to a wrongly encoded thousands separator. (see sample script below)

The obtained character could not be decoded in any format to obtain a regular "whitespace"

- This issue is present only when using ISO-8859-1 encoding (sample scriupt works finie when using UTF-8)

- This issue is not present in 7.2.9 (present at least since 7.2.15)


ICU package versions: 

Internationalization support => enabled
version => 1.1.0
ICU version => 63.1
ICU Data version => 63.1
ICU TZData version => 2018e
ICU Unicode version => 11.0

Test script:
---------------
$formatter = new \NumberFormatter(
   'fr_fr', 
   \NumberFormatter::CURRENCY
);
var_dump($formatter->formatCurrency(1400, 'CHF'));

//string(15) "1 400,00 CHF"


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2019-04-17 14:05 UTC] cmb@php.net
-Summary: Wrong thousans separator encoding in NumberFormatter +Summary: Wrong thousands separator encoding in NumberFormatter -Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2019-04-17 14:05 UTC] cmb@php.net
The output is UTF-8 encoded, which is expected.  If you need
ISO-8859-1, use a conversion function, such as iconv().
 [2019-04-17 14:10 UTC] mvachette at adequasys dot com
Actually, this is not regular UTF8 output : 
- the last whitespace is utf_ encoded non breakable space
- but the first one (thousands separator) will generate an error when I try do decoded it using iconv : "iconv(): Detected an illegal character in input string"
 [2019-04-17 14:21 UTC] cmb@php.net
-Status: Not a bug +Status: Assigned
 [2019-04-17 14:21 UTC] cmb@php.net
Well, I'll have a closer look.
 [2019-04-17 15:06 UTC] cmb@php.net
-Status: Assigned +Status: Verified
 [2019-04-17 15:06 UTC] cmb@php.net
Indeed, there is an U+802F instead of the expected U+00A0 with ICU
63.1, but neither with ICU 57.1 nor 60.2.
 [2019-04-22 23:26 UTC] cmb@php.net
-Status: Verified +Status: Feedback
 [2019-04-22 23:26 UTC] cmb@php.net
Correction: I'm not getting U+802F, but rather U+202F with ICU
63.1.  This is a NARROW NO-BREAK SPACE, which appears to be fine.
Please post your output of: 

    var_dump(bin2hex($formatter->formatCurrency(1400, 'CHF')));
 [2019-04-29 08:15 UTC] mvachette at adequasys dot com
-Status: Feedback +Status: Assigned
 [2019-04-29 08:15 UTC] mvachette at adequasys dot com
Please find below the output you asked:


// var_dump(bin2hex($formatter->formatCurrency(1400, 'CHF')));
string(30) "31e280af3430302c3030c2a0434846"
 [2019-04-29 09:44 UTC] cmb@php.net
-Status: Assigned +Status: Not a bug
 [2019-04-29 09:44 UTC] cmb@php.net
Thanks!  This is also what I get.

"e280af" represents a NARROW NO-BREAK SPACE, "c2a0" a NO-BREAK
SPACE.  Both are proper Unicode code points.  While the NO-BREAK
SPACE can be represented in ISO-8859-1, the NARROW NO-BREAK SPACE
cannot.  This is why iconv() raises a notice and fails.  So to
properly convert, do:

    iconv('UTF-8', 'ISO-8859-1//IGNORE', $formatter->formatCurrency(1400, 'CHF'))

Or use mb_convert_encoding() which gives more flexibility than
simply ignoring unsupported code points.

Anyhow, this is not a bug.
 [2019-04-30 08:53 UTC] mvachette at adequasys dot com
Thanks a lot for your investigations.
The iconv with "IGNORE" flag works well.

For information, I actually managed to keep the thousands separator by replacing the NARROW NO-BREAK SPACE by a regular whitespace before encoding conversion:

$value = $formatter->formatCurrency(1400, 'CHF');
$valueWithoutNarrowNoBreakSpace = json_decode(str_replace('\u202f', ' ', json_encode($value)));
var_dump(iconv('UTF-8', 'ISO-8859-1//IGNORE', $valueWithoutNarrowNoBreakSpace));
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Thu Mar 13 21:01:32 2025 UTC