PHP :: Bug #63433 :: fgetcsv not working for Unicode files with BOM prefix

Bug #63433

fgetcsv not working for Unicode files with BOM prefix

Submitted:

2012-11-03 21:11 UTC

Modified:

2016-07-31 14:20 UTC

Votes:	22
Avg. Score:	4.4 ± 1.0
Reproduced:	19 of 19 (100.0%)
Same Version:	7 (36.8%)
Same OS:	13 (68.4%)

From:

alec dot cormack at cloud-corporate dot com

Assigned:

cmb (profile)

Status:

Not a bug

Package:

Filesystem function related

PHP Version:

5.3.18

OS:

LINUX

Private report:

CVE-ID:

None

View Developer Edit

[2012-11-03 21:11 UTC] alec dot cormack at cloud-corporate dot com

Description:
------------
In php 5.3.x when using fgetcsv to read a unicode file including a UTF-8 Byte 
Order Mark (BOM) prefix 0xEF,0xBB,0xBF the first row of the file is not read 
correctly.  If the BOM is removed fgetcsv reads the file correctly. 

I have tried this with and without setlocale and the result is always wrong.  I 
have run the same program on PHP 5.2.4 and it works.

Test File is the simplest possible csv with the BOM prefix "a" followed by a 
newline contains (7 characters in total)

0xEF,0xBB,0xBF,0x22,0x61,0x22,0x0A

When processed by fgetcsv the doublequotes should get removed and the value a 
should be in the array returned.  



Test script:
---------------
<?php

echo mb_detect_encoding(file_get_contents($argv[1]))."\n";

setlocale(LC_CTYPE, 'en_GB.utf8');

$handle = fopen($argv[1], "r");
$data = fgetcsv($handle, 1000, ",");
print_r($data);
?>


Expected result:
----------------
UTF-8
Array
(
    [0] => a
)


Actual result:
--------------
UTF-8
Array
(
    [0] => "a"
)

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports

[2016-07-31 14:20 UTC] cmb@php.net

-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb

[2016-07-31 14:20 UTC] cmb@php.net

The behavioral change has been introduced with PHP 5.3.7, see
<https://3v4l.org/I3e3l>. It has been caused by
<http://svn.php.net/viewvc/?view=revision&amp;revision=311543>
respectively
<http://git.php.net/?p=php-src.git;a=commit;h=57674f7> (the latter
appears to be a fix of the former).

Anyhow, the "new" behavior is not a bug. There's simply no special
handling for the BOM, and so it is treated as being part of the
first (and only) field, as the var_dump output shows (note that
the strlen is reported as 6, and not 3). That is consistent with
other file functions, such as fgets(), see
<https://3v4l.org/5sQrN>.

	php.net \| support \| documentation \| report a bug \| advanced search \| search howto \| statistics \| random bug \| login
go to bug id or search bugs for


Copyright © 2001-2026 The PHP Group All rights reserved.	Last updated: Tue Mar 10 12:00:01 2026 UTC