php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #63433 fgetcsv not working for Unicode files with BOM prefix
Submitted: 2012-11-03 21:11 UTC Modified: 2016-07-31 14:20 UTC
Votes:22
Avg. Score:4.4 ± 1.0
Reproduced:19 of 19 (100.0%)
Same Version:7 (36.8%)
Same OS:13 (68.4%)
From: alec dot cormack at cloud-corporate dot com Assigned: cmb (profile)
Status: Not a bug Package: Filesystem function related
PHP Version: 5.3.18 OS: LINUX
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: alec dot cormack at cloud-corporate dot com
New email:
PHP Version: OS:

 

 [2012-11-03 21:11 UTC] alec dot cormack at cloud-corporate dot com
Description:
------------
In php 5.3.x when using fgetcsv to read a unicode file including a UTF-8 Byte 
Order Mark (BOM) prefix 0xEF,0xBB,0xBF the first row of the file is not read 
correctly.  If the BOM is removed fgetcsv reads the file correctly. 

I have tried this with and without setlocale and the result is always wrong.  I 
have run the same program on PHP 5.2.4 and it works.

Test File is the simplest possible csv with the BOM prefix "a" followed by a 
newline contains (7 characters in total)

0xEF,0xBB,0xBF,0x22,0x61,0x22,0x0A

When processed by fgetcsv the doublequotes should get removed and the value a 
should be in the array returned.  



Test script:
---------------
<?php

echo mb_detect_encoding(file_get_contents($argv[1]))."\n";

setlocale(LC_CTYPE, 'en_GB.utf8');

$handle = fopen($argv[1], "r");
$data = fgetcsv($handle, 1000, ",");
print_r($data);
?>


Expected result:
----------------
UTF-8
Array
(
    [0] => a
)


Actual result:
--------------
UTF-8
Array
(
    [0] => "a"
)


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-07-31 14:20 UTC] cmb@php.net
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2016-07-31 14:20 UTC] cmb@php.net
The behavioral change has been introduced with PHP 5.3.7, see
<https://3v4l.org/I3e3l>. It has been caused by
<http://svn.php.net/viewvc/?view=revision&amp;revision=311543>
respectively
<http://git.php.net/?p=php-src.git;a=commit;h=57674f7> (the latter
appears to be a fix of the former).

Anyhow, the "new" behavior is not a bug. There's simply no special
handling for the BOM, and so it is treated as being part of the
first (and only) field, as the var_dump output shows (note that
the strlen is reported as 6, and not 3). That is consistent with
other file functions, such as fgets(), see
<https://3v4l.org/5sQrN>.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 17:01:58 2024 UTC