php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #38471 fgetcsv(): locale dependency of delimiter / enclosure arg
Submitted: 2006-08-16 12:15 UTC Modified: 2006-08-16 12:44 UTC
Votes:28
Avg. Score:4.1 ± 1.2
Reproduced:18 of 24 (75.0%)
Same Version:4 (22.2%)
Same OS:9 (50.0%)
From: jo at feuersee dot de Assigned:
Status: Wont fix Package: Filesystem function related
PHP Version: 5.1.4 OS: Linux
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: jo at feuersee dot de
New email:
PHP Version: OS:

 

 [2006-08-16 12:15 UTC] jo at feuersee dot de
Description:
------------
fgetcsv() is affected by locale settings in a way the PHP 
documentation dosen't imply. It says:
"Locale setting is taken into account by this function. If 
LANG is e.g. en_US.UTF-8, files in one-byte encoding are 
read wrong by this function."

In the example below, reading CSV data is broken because 
language settings affect the character following the 
delimiter (even when delimiter is a 1 byte character).
I expected that fgetcsv() would work independent of locale 
settings as long as delimiter and enclosing are US-ASCII 
clean.

To my surprise, using multibyte UTF-8 characters (and 
proper locale setting) as delimiter didn't throw a 
warning, but didn't work as expected. It seems that only 
the 1st byte (the documentation says character) is used?

Other side effects:
The documentation isn't clear about the enclosure arg:
- The enclosure arg is optional but defaults to a "
- Explicit setting an empty enclosure '' throws a warning, 
suggesting that the CSV data requires enclosing
- But fgetcsv() (with or w/o enclosure) works on both 
enclosed and non-enclosed CSV data, seems it's optional 
anyway?
- It is unclear which setlocale() category arg affects 
fgetcsv() behavior. By testing, LC_ALL and LC_CTYPE did 
work, I did expect LC_COLLATE (instead of LC_CTYPE)

Reproduce code:
---------------
<?php
$f = fopen('test.csv', 'r');
if(is_resource($f)) {
	while(($data = fgetcsv($f, 1000, ';')) !== FALSE) {
		print_r($data);
	}
	fclose($f);
}
?>

And give a test.csv in UTF-8 encoding with multibyte characters in the 1st place after delimiter:
A;?bc
B;bcd
--eof

And let the PHP process either inherit LANG=C or set it explicit or add 
setlocale(LC_CTYPE, 'C'); 
above


Expected result:
----------------
Array
(
    [0] => A
    [1] => ?bc
)
Array
(
    [0] => B
    [1] => bcd
)

Actual result:
--------------
Array
(
    [0] => A
    [1] => bc
)
Array
(
    [0] => B
    [1] => bcd
)



Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-08-16 12:44 UTC] tony2001@php.net
We're working Unicode support in PHP6. but it won't appear in previous versions.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Oct 04 05:01:27 2024 UTC