|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #38471 fgetcsv(): locale dependency of delimiter / enclosure arg
Submitted: 2006-08-16 12:15 UTC Modified: 2006-08-16 12:44 UTC
Avg. Score:4.1 ± 1.2
Reproduced:18 of 24 (75.0%)
Same Version:4 (22.2%)
Same OS:9 (50.0%)
From: jo at feuersee dot de Assigned:
Status: Wont fix Package: Filesystem function related
PHP Version: 5.1.4 OS: Linux
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please — but make sure to vote on the bug!
Your email address:
Solve the problem:
41 - 11 = ?
Subscribe to this entry?

 [2006-08-16 12:15 UTC] jo at feuersee dot de
fgetcsv() is affected by locale settings in a way the PHP 
documentation dosen't imply. It says:
"Locale setting is taken into account by this function. If 
LANG is e.g. en_US.UTF-8, files in one-byte encoding are 
read wrong by this function."

In the example below, reading CSV data is broken because 
language settings affect the character following the 
delimiter (even when delimiter is a 1 byte character).
I expected that fgetcsv() would work independent of locale 
settings as long as delimiter and enclosing are US-ASCII 

To my surprise, using multibyte UTF-8 characters (and 
proper locale setting) as delimiter didn't throw a 
warning, but didn't work as expected. It seems that only 
the 1st byte (the documentation says character) is used?

Other side effects:
The documentation isn't clear about the enclosure arg:
- The enclosure arg is optional but defaults to a "
- Explicit setting an empty enclosure '' throws a warning, 
suggesting that the CSV data requires enclosing
- But fgetcsv() (with or w/o enclosure) works on both 
enclosed and non-enclosed CSV data, seems it's optional 
- It is unclear which setlocale() category arg affects 
fgetcsv() behavior. By testing, LC_ALL and LC_CTYPE did 
work, I did expect LC_COLLATE (instead of LC_CTYPE)

Reproduce code:
$f = fopen('test.csv', 'r');
if(is_resource($f)) {
	while(($data = fgetcsv($f, 1000, ';')) !== FALSE) {

And give a test.csv in UTF-8 encoding with multibyte characters in the 1st place after delimiter:

And let the PHP process either inherit LANG=C or set it explicit or add 
setlocale(LC_CTYPE, 'C'); 

Expected result:
    [0] => A
    [1] => ?bc
    [0] => B
    [1] => bcd

Actual result:
    [0] => A
    [1] => bc
    [0] => B
    [1] => bcd


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2006-08-16 12:44 UTC]
We're working Unicode support in PHP6. but it won't appear in previous versions.
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu May 23 23:01:32 2024 UTC