php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #63433 fgetcsv not working for Unicode files with BOM prefix
Submitted: 2012-11-03 21:11 UTC Modified: 2016-07-31 14:20 UTC
Votes:22
Avg. Score:4.4 ± 1.0
Reproduced:19 of 19 (100.0%)
Same Version:7 (36.8%)
Same OS:13 (68.4%)
From: alec dot cormack at cloud-corporate dot com Assigned: cmb (profile)
Status: Not a bug Package: Filesystem function related
PHP Version: 5.3.18 OS: LINUX
Private report: No CVE-ID: None
 [2012-11-03 21:11 UTC] alec dot cormack at cloud-corporate dot com
Description:
------------
In php 5.3.x when using fgetcsv to read a unicode file including a UTF-8 Byte 
Order Mark (BOM) prefix 0xEF,0xBB,0xBF the first row of the file is not read 
correctly.  If the BOM is removed fgetcsv reads the file correctly. 

I have tried this with and without setlocale and the result is always wrong.  I 
have run the same program on PHP 5.2.4 and it works.

Test File is the simplest possible csv with the BOM prefix "a" followed by a 
newline contains (7 characters in total)

0xEF,0xBB,0xBF,0x22,0x61,0x22,0x0A

When processed by fgetcsv the doublequotes should get removed and the value a 
should be in the array returned.  



Test script:
---------------
<?php

echo mb_detect_encoding(file_get_contents($argv[1]))."\n";

setlocale(LC_CTYPE, 'en_GB.utf8');

$handle = fopen($argv[1], "r");
$data = fgetcsv($handle, 1000, ",");
print_r($data);
?>


Expected result:
----------------
UTF-8
Array
(
    [0] => a
)


Actual result:
--------------
UTF-8
Array
(
    [0] => "a"
)


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-07-31 14:20 UTC] cmb@php.net
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2016-07-31 14:20 UTC] cmb@php.net
The behavioral change has been introduced with PHP 5.3.7, see
<https://3v4l.org/I3e3l>. It has been caused by
<http://svn.php.net/viewvc/?view=revision&amp;revision=311543>
respectively
<http://git.php.net/?p=php-src.git;a=commit;h=57674f7> (the latter
appears to be a fix of the former).

Anyhow, the "new" behavior is not a bug. There's simply no special
handling for the BOM, and so it is treated as being part of the
first (and only) field, as the var_dump output shows (note that
the strlen is reported as 6, and not 3). That is consistent with
other file functions, such as fgets(), see
<https://3v4l.org/5sQrN>.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 14 08:01:27 2024 UTC