php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #38471 fgetcsv(): locale dependency of delimiter / enclosure arg
Submitted: 2006-08-16 12:15 UTC Modified: 2006-08-16 12:44 UTC
Votes:28
Avg. Score:4.1 ± 1.2
Reproduced:18 of 24 (75.0%)
Same Version:4 (22.2%)
Same OS:9 (50.0%)
From: jo at feuersee dot de Assigned:
Status: Wont fix Package: Filesystem function related
PHP Version: 5.1.4 OS: Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: jo at feuersee dot de
New email:
PHP Version: OS:

 

 [2006-08-16 12:15 UTC] jo at feuersee dot de
Description:
------------
fgetcsv() is affected by locale settings in a way the PHP 
documentation dosen't imply. It says:
"Locale setting is taken into account by this function. If 
LANG is e.g. en_US.UTF-8, files in one-byte encoding are 
read wrong by this function."

In the example below, reading CSV data is broken because 
language settings affect the character following the 
delimiter (even when delimiter is a 1 byte character).
I expected that fgetcsv() would work independent of locale 
settings as long as delimiter and enclosing are US-ASCII 
clean.

To my surprise, using multibyte UTF-8 characters (and 
proper locale setting) as delimiter didn't throw a 
warning, but didn't work as expected. It seems that only 
the 1st byte (the documentation says character) is used?

Other side effects:
The documentation isn't clear about the enclosure arg:
- The enclosure arg is optional but defaults to a "
- Explicit setting an empty enclosure '' throws a warning, 
suggesting that the CSV data requires enclosing
- But fgetcsv() (with or w/o enclosure) works on both 
enclosed and non-enclosed CSV data, seems it's optional 
anyway?
- It is unclear which setlocale() category arg affects 
fgetcsv() behavior. By testing, LC_ALL and LC_CTYPE did 
work, I did expect LC_COLLATE (instead of LC_CTYPE)

Reproduce code:
---------------
<?php
$f = fopen('test.csv', 'r');
if(is_resource($f)) {
	while(($data = fgetcsv($f, 1000, ';')) !== FALSE) {
		print_r($data);
	}
	fclose($f);
}
?>

And give a test.csv in UTF-8 encoding with multibyte characters in the 1st place after delimiter:
A;?bc
B;bcd
--eof

And let the PHP process either inherit LANG=C or set it explicit or add 
setlocale(LC_CTYPE, 'C'); 
above


Expected result:
----------------
Array
(
    [0] => A
    [1] => ?bc
)
Array
(
    [0] => B
    [1] => bcd
)

Actual result:
--------------
Array
(
    [0] => A
    [1] => bc
)
Array
(
    [0] => B
    [1] => bcd
)



Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-08-16 12:44 UTC] tony2001@php.net
We're working Unicode support in PHP6. but it won't appear in previous versions.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 13:01:31 2024 UTC