php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #77914 Wrong thousands separator encoding in NumberFormatter
Submitted: 2019-04-17 13:33 UTC Modified: 2019-04-30 08:53 UTC
From: mvachette at adequasys dot com Assigned: cmb (profile)
Status: Not a bug Package: intl (PECL)
PHP Version: 7.2.17 OS: Windows 10
Private report: No CVE-ID: None
 [2019-04-17 13:33 UTC] mvachette at adequasys dot com
Description:
------------
When using NumberFormatter on a PHP using ISO-8859-1 encoding,  formatting a currency will result to a wrongly encoded thousands separator. (see sample script below)

The obtained character could not be decoded in any format to obtain a regular "whitespace"

- This issue is present only when using ISO-8859-1 encoding (sample scriupt works finie when using UTF-8)

- This issue is not present in 7.2.9 (present at least since 7.2.15)


ICU package versions: 

Internationalization support => enabled
version => 1.1.0
ICU version => 63.1
ICU Data version => 63.1
ICU TZData version => 2018e
ICU Unicode version => 11.0

Test script:
---------------
$formatter = new \NumberFormatter(
   'fr_fr', 
   \NumberFormatter::CURRENCY
);
var_dump($formatter->formatCurrency(1400, 'CHF'));

//string(15) "1 400,00 CHF"


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2019-04-17 14:05 UTC] cmb@php.net
-Summary: Wrong thousans separator encoding in NumberFormatter +Summary: Wrong thousands separator encoding in NumberFormatter -Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2019-04-17 14:05 UTC] cmb@php.net
The output is UTF-8 encoded, which is expected.  If you need
ISO-8859-1, use a conversion function, such as iconv().
 [2019-04-17 14:10 UTC] mvachette at adequasys dot com
Actually, this is not regular UTF8 output : 
- the last whitespace is utf_ encoded non breakable space
- but the first one (thousands separator) will generate an error when I try do decoded it using iconv : "iconv(): Detected an illegal character in input string"
 [2019-04-17 14:21 UTC] cmb@php.net
-Status: Not a bug +Status: Assigned
 [2019-04-17 14:21 UTC] cmb@php.net
Well, I'll have a closer look.
 [2019-04-17 15:06 UTC] cmb@php.net
-Status: Assigned +Status: Verified
 [2019-04-17 15:06 UTC] cmb@php.net
Indeed, there is an U+802F instead of the expected U+00A0 with ICU
63.1, but neither with ICU 57.1 nor 60.2.
 [2019-04-22 23:26 UTC] cmb@php.net
-Status: Verified +Status: Feedback
 [2019-04-22 23:26 UTC] cmb@php.net
Correction: I'm not getting U+802F, but rather U+202F with ICU
63.1.  This is a NARROW NO-BREAK SPACE, which appears to be fine.
Please post your output of: 

    var_dump(bin2hex($formatter->formatCurrency(1400, 'CHF')));
 [2019-04-29 08:15 UTC] mvachette at adequasys dot com
-Status: Feedback +Status: Assigned
 [2019-04-29 08:15 UTC] mvachette at adequasys dot com
Please find below the output you asked:


// var_dump(bin2hex($formatter->formatCurrency(1400, 'CHF')));
string(30) "31e280af3430302c3030c2a0434846"
 [2019-04-29 09:44 UTC] cmb@php.net
-Status: Assigned +Status: Not a bug
 [2019-04-29 09:44 UTC] cmb@php.net
Thanks!  This is also what I get.

"e280af" represents a NARROW NO-BREAK SPACE, "c2a0" a NO-BREAK
SPACE.  Both are proper Unicode code points.  While the NO-BREAK
SPACE can be represented in ISO-8859-1, the NARROW NO-BREAK SPACE
cannot.  This is why iconv() raises a notice and fails.  So to
properly convert, do:

    iconv('UTF-8', 'ISO-8859-1//IGNORE', $formatter->formatCurrency(1400, 'CHF'))

Or use mb_convert_encoding() which gives more flexibility than
simply ignoring unsupported code points.

Anyhow, this is not a bug.
 [2019-04-30 08:53 UTC] mvachette at adequasys dot com
Thanks a lot for your investigations.
The iconv with "IGNORE" flag works well.

For information, I actually managed to keep the thousands separator by replacing the NARROW NO-BREAK SPACE by a regular whitespace before encoding conversion:

$value = $formatter->formatCurrency(1400, 'CHF');
$valueWithoutNarrowNoBreakSpace = json_decode(str_replace('\u202f', ' ', json_encode($value)));
var_dump(iconv('UTF-8', 'ISO-8859-1//IGNORE', $valueWithoutNarrowNoBreakSpace));
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Mar 29 13:01:29 2024 UTC