|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #63433 fgetcsv not working for Unicode files with BOM prefix
Submitted: 2012-11-03 21:11 UTC Modified: 2016-07-31 14:20 UTC
Avg. Score:4.4 ± 1.0
Reproduced:19 of 19 (100.0%)
Same Version:7 (36.8%)
Same OS:13 (68.4%)
From: alec dot cormack at cloud-corporate dot com Assigned: cmb (profile)
Status: Not a bug Package: Filesystem function related
PHP Version: 5.3.18 OS: LINUX
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
Block user comment
Status: Assign to:
Bug Type:
From: alec dot cormack at cloud-corporate dot com
New email:
PHP Version: OS:


 [2012-11-03 21:11 UTC] alec dot cormack at cloud-corporate dot com
In php 5.3.x when using fgetcsv to read a unicode file including a UTF-8 Byte 
Order Mark (BOM) prefix 0xEF,0xBB,0xBF the first row of the file is not read 
correctly.  If the BOM is removed fgetcsv reads the file correctly. 

I have tried this with and without setlocale and the result is always wrong.  I 
have run the same program on PHP 5.2.4 and it works.

Test File is the simplest possible csv with the BOM prefix "a" followed by a 
newline contains (7 characters in total)


When processed by fgetcsv the doublequotes should get removed and the value a 
should be in the array returned.  

Test script:

echo mb_detect_encoding(file_get_contents($argv[1]))."\n";

setlocale(LC_CTYPE, 'en_GB.utf8');

$handle = fopen($argv[1], "r");
$data = fgetcsv($handle, 1000, ",");

Expected result:
    [0] => a

Actual result:
    [0] => "a"


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2016-07-31 14:20 UTC]
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2016-07-31 14:20 UTC]
The behavioral change has been introduced with PHP 5.3.7, see
<>. It has been caused by
<;a=commit;h=57674f7> (the latter
appears to be a fix of the former).

Anyhow, the "new" behavior is not a bug. There's simply no special
handling for the BOM, and so it is treated as being part of the
first (and only) field, as the var_dump output shows (note that
the strlen is reported as 6, and not 3). That is consistent with
other file functions, such as fgets(), see
PHP Copyright © 2001-2020 The PHP Group
All rights reserved.
Last updated: Mon Jul 06 04:01:29 2020 UTC