php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #46217 fgetcsv() parses a csv file in the greek encoding incorrectly
Submitted: 2008-10-02 12:20 UTC Modified: 2008-11-03 01:00 UTC
Votes:3
Avg. Score:3.0 ± 1.6
Reproduced:3 of 3 (100.0%)
Same Version:1 (33.3%)
Same OS:2 (66.7%)
From: brook73 at gmail dot com Assigned:
Status: No Feedback Package: Filesystem function related
PHP Version: 5.2.5 OS: Ubuntu 8.04
Private report: No CVE-ID: None
 [2008-10-02 12:20 UTC] brook73 at gmail dot com
Description:
------------
The "fgetcsv" function parses a file in the greek encoding  (ISO-8859-7) incorrectly - a lot of symbols are ignored.

The "setlocale" function has not helped either (we tried setlocale(LC_ALL, 'gr_GR'), setlocale(LC_ALL, 'gr_GR.ISO-8895-7')).

Can anyone help us and explain the reason why it happens?

The PHP version is 5.2.5.

Reproduce code:
---------------
<?php

$max_line_size = 16384;
$delimiter = ";";

$f = fopen('somefile.csv', 'rb');

while (($data = fgetcsv($f, $max_line_size, $delimiter)) !== false) {
  print_r($data);
}

?>

Example of the line in csv file:

&#915;&#927;&#924;000112;&#917;&#943;&#948;&#951; &#915;&#961;&#945;&#966;&#942;&#962; - &#916;&#953;&#972;&#961;&#952;&#969;&#963;&#951;&#962;///&#915;&#972;&#956;&#949;&#962;;1.30;1.30;30 Sep 2008 00:00:00;N;&#920;&#929;&#933;&#923;&#927;&#931;3;&#917;&#913;&#917;&#917;&#913;&#917;&#913;&#917;&#917;&#913;&#917;&#913;&#917;&#913;&#917;;

Expected result:
----------------
Debug [0/0]:Array
(
    [0] => &#915;&#927;&#924;000112
    [1] => &#917;&#943;&#948;&#951; &#915;&#961;&#945;&#966;&#942;&#962; - &#916;&#953;&#972;&#961;&#952;&#969;&#963;&#951;&#962;///&#915;&#972;&#956;&#949;&#962;
    [2] => 1.30
    [3] => 1.30
    [4] => 30 Sep 2008 00:00:00
    [5] => N
    [6] => &#920;&#929;&#933;&#923;&#927;&#931;3
    [7] => &#917;&#913;&#917;&#917;&#913;&#917;&#913;&#917;&#917;&#913;&#917;&#913;&#917;&#913;&#917;
)


Actual result:
--------------
Debug [0/0]:Array
(
    [0] => 000112
    [1] => - &#916;&#953;&#972;&#961;&#952;&#969;&#963;&#951;&#962;///&#915;&#972;&#956;&#949;&#962;
    [2] => 1.30
    [3] => 1.30
    [4] => 30 Sep 2008 00:00:00
    [5] => N
    [6] => 3
    [7] => 
)


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-10-02 12:34 UTC] brook73 at gmail dot com
Re:

Example of the line in csv file:

&#915;&#927;&#924;000112;&#917;&#943;&#948;&#951; &#915;&#961;&#945;&#966;&#942;&#962; - &#916;&#953;&#972;&#961;&#952;&#969;&#963;&#951;&#962;///&#915;&#972;&#956;&#949;&#962;;1.30;1.30;30 Sep 2008 00:00:00;N;&#920;&#929;&#933;&#923;&#927;&#931;3;&#917;&#913;&#917;&#917;&#913;&#917;&#913;&#917;&#917;&#913;&#917;&#913;&#917;&#913;&#917;

Expected result:

Debug [0/0]:Array
(
    [0] => &#915;&#927;&#924;000112
    [1] => &#917;&#943;&#948;&#951; &#915;&#961;&#945;&#966;&#942;&#962; - &#916;&#953;&#972;&#961;&#952;&#969;&#963;&#951;&#962;///&#915;&#972;&#956;&#949;&#962;
    [2] => 1.30
    [3] => 1.30
    [4] => 30 Sep 2008 00:00:00
    [5] => N
    [6] => &#920;&#929;&#933;&#923;&#927;&#931;3
    [7] => &#917;&#913;&#917;&#917;&#913;&#917;&#913;&#917;&#917;&#913;&#917;&#913;&#917;&#913;&#917;
)

Actual result

Expected result:

Debug [0/0]:Array
(
    [0] => 000112
    [1] => - &#916;&#953;&#972;&#961;&#952;&#969;&#963;&#951;&#962;///&#915;&#972;&#956;&#949;&#962;
    [2] => 1.30
    [3] => 1.30
    [4] => 30 Sep 2008 00:00:00
    [5] => N
    [6] => 3
    [7] => 
)
 [2008-10-02 13:23 UTC] brook73 at gmail dot com
Please use this file

http://dev.cs-cart.com/~brook/test.csv
 [2008-10-20 17:24 UTC] mike at regexia dot com
(This is my first attempt at fixing a bug, so please bear with me. :))

Patch available here: http://www.regexia.com/php/bug46217/bug46217.diff
Test case as well: http://www.regexia.com/php/bug46217/bug46217.phpt

Explanation:
The initial pass on a field tries to skip whitespace. If php_mblen() returns -2 or -1 that character is skipped (as if it's whitespace). Regardless of locale, non-ASCII characters were returning -1 (invalid). My patch treats those characters as regular non-WS characters. This behavior seems to be consistent with non-ASCII handling in the middle of a CSV field.

Enclosing the CSV field data in a quote or the like works around the issue.

Hope this is clear and the correct protocol for submitting this patch. :)

Mike
 [2008-10-26 19:17 UTC] jani@php.net
Please try using this CVS snapshot:

  http://snaps.php.net/php5.2-latest.tar.gz
 
For Windows:

  http://windows.php.net/snapshots/


 [2008-11-03 01:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 [2012-10-10 11:53 UTC] erwin32 dot 64 at gmail dot com
try setlocale(LC_ALL, 'el_GR') instead setlocale(LC_ALL, 'gr_GR') greek language 
is el and state is GR
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Jan 15 11:01:31 2025 UTC