|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
[2006-08-16 12:44 UTC] tony2001@php.net
|
|||||||||||||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Wed Oct 29 00:00:01 2025 UTC |
Description: ------------ fgetcsv() is affected by locale settings in a way the PHP documentation dosen't imply. It says: "Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function." In the example below, reading CSV data is broken because language settings affect the character following the delimiter (even when delimiter is a 1 byte character). I expected that fgetcsv() would work independent of locale settings as long as delimiter and enclosing are US-ASCII clean. To my surprise, using multibyte UTF-8 characters (and proper locale setting) as delimiter didn't throw a warning, but didn't work as expected. It seems that only the 1st byte (the documentation says character) is used? Other side effects: The documentation isn't clear about the enclosure arg: - The enclosure arg is optional but defaults to a " - Explicit setting an empty enclosure '' throws a warning, suggesting that the CSV data requires enclosing - But fgetcsv() (with or w/o enclosure) works on both enclosed and non-enclosed CSV data, seems it's optional anyway? - It is unclear which setlocale() category arg affects fgetcsv() behavior. By testing, LC_ALL and LC_CTYPE did work, I did expect LC_COLLATE (instead of LC_CTYPE) Reproduce code: --------------- <?php $f = fopen('test.csv', 'r'); if(is_resource($f)) { while(($data = fgetcsv($f, 1000, ';')) !== FALSE) { print_r($data); } fclose($f); } ?> And give a test.csv in UTF-8 encoding with multibyte characters in the 1st place after delimiter: A;?bc B;bcd --eof And let the PHP process either inherit LANG=C or set it explicit or add setlocale(LC_CTYPE, 'C'); above Expected result: ---------------- Array ( [0] => A [1] => ?bc ) Array ( [0] => B [1] => bcd ) Actual result: -------------- Array ( [0] => A [1] => bc ) Array ( [0] => B [1] => bcd )