php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #46217 fgetcsv() parses a csv file in the greek encoding incorrectly
Submitted: 2008-10-02 12:20 UTC Modified: 2008-11-03 01:00 UTC
Votes:3
Avg. Score:3.0 ± 1.6
Reproduced:3 of 3 (100.0%)
Same Version:1 (33.3%)
Same OS:2 (66.7%)
From: brook73 at gmail dot com Assigned:
Status: No Feedback Package: Filesystem function related
PHP Version: 5.2.5 OS: Ubuntu 8.04
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: brook73 at gmail dot com
New email:
PHP Version: OS:

 

 [2008-10-02 12:20 UTC] brook73 at gmail dot com
Description:
------------
The "fgetcsv" function parses a file in the greek encoding  (ISO-8859-7) incorrectly - a lot of symbols are ignored.

The "setlocale" function has not helped either (we tried setlocale(LC_ALL, 'gr_GR'), setlocale(LC_ALL, 'gr_GR.ISO-8895-7')).

Can anyone help us and explain the reason why it happens?

The PHP version is 5.2.5.

Reproduce code:
---------------
<?php

$max_line_size = 16384;
$delimiter = ";";

$f = fopen('somefile.csv', 'rb');

while (($data = fgetcsv($f, $max_line_size, $delimiter)) !== false) {
  print_r($data);
}

?>

Example of the line in csv file:

&#915;&#927;&#924;000112;&#917;&#943;&#948;&#951; &#915;&#961;&#945;&#966;&#942;&#962; - &#916;&#953;&#972;&#961;&#952;&#969;&#963;&#951;&#962;///&#915;&#972;&#956;&#949;&#962;;1.30;1.30;30 Sep 2008 00:00:00;N;&#920;&#929;&#933;&#923;&#927;&#931;3;&#917;&#913;&#917;&#917;&#913;&#917;&#913;&#917;&#917;&#913;&#917;&#913;&#917;&#913;&#917;;

Expected result:
----------------
Debug [0/0]:Array
(
    [0] => &#915;&#927;&#924;000112
    [1] => &#917;&#943;&#948;&#951; &#915;&#961;&#945;&#966;&#942;&#962; - &#916;&#953;&#972;&#961;&#952;&#969;&#963;&#951;&#962;///&#915;&#972;&#956;&#949;&#962;
    [2] => 1.30
    [3] => 1.30
    [4] => 30 Sep 2008 00:00:00
    [5] => N
    [6] => &#920;&#929;&#933;&#923;&#927;&#931;3
    [7] => &#917;&#913;&#917;&#917;&#913;&#917;&#913;&#917;&#917;&#913;&#917;&#913;&#917;&#913;&#917;
)


Actual result:
--------------
Debug [0/0]:Array
(
    [0] => 000112
    [1] => - &#916;&#953;&#972;&#961;&#952;&#969;&#963;&#951;&#962;///&#915;&#972;&#956;&#949;&#962;
    [2] => 1.30
    [3] => 1.30
    [4] => 30 Sep 2008 00:00:00
    [5] => N
    [6] => 3
    [7] => 
)


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-10-02 12:34 UTC] brook73 at gmail dot com
Re:

Example of the line in csv file:

&#915;&#927;&#924;000112;&#917;&#943;&#948;&#951; &#915;&#961;&#945;&#966;&#942;&#962; - &#916;&#953;&#972;&#961;&#952;&#969;&#963;&#951;&#962;///&#915;&#972;&#956;&#949;&#962;;1.30;1.30;30 Sep 2008 00:00:00;N;&#920;&#929;&#933;&#923;&#927;&#931;3;&#917;&#913;&#917;&#917;&#913;&#917;&#913;&#917;&#917;&#913;&#917;&#913;&#917;&#913;&#917;

Expected result:

Debug [0/0]:Array
(
    [0] => &#915;&#927;&#924;000112
    [1] => &#917;&#943;&#948;&#951; &#915;&#961;&#945;&#966;&#942;&#962; - &#916;&#953;&#972;&#961;&#952;&#969;&#963;&#951;&#962;///&#915;&#972;&#956;&#949;&#962;
    [2] => 1.30
    [3] => 1.30
    [4] => 30 Sep 2008 00:00:00
    [5] => N
    [6] => &#920;&#929;&#933;&#923;&#927;&#931;3
    [7] => &#917;&#913;&#917;&#917;&#913;&#917;&#913;&#917;&#917;&#913;&#917;&#913;&#917;&#913;&#917;
)

Actual result

Expected result:

Debug [0/0]:Array
(
    [0] => 000112
    [1] => - &#916;&#953;&#972;&#961;&#952;&#969;&#963;&#951;&#962;///&#915;&#972;&#956;&#949;&#962;
    [2] => 1.30
    [3] => 1.30
    [4] => 30 Sep 2008 00:00:00
    [5] => N
    [6] => 3
    [7] => 
)
 [2008-10-02 13:23 UTC] brook73 at gmail dot com
Please use this file

http://dev.cs-cart.com/~brook/test.csv
 [2008-10-20 17:24 UTC] mike at regexia dot com
(This is my first attempt at fixing a bug, so please bear with me. :))

Patch available here: http://www.regexia.com/php/bug46217/bug46217.diff
Test case as well: http://www.regexia.com/php/bug46217/bug46217.phpt

Explanation:
The initial pass on a field tries to skip whitespace. If php_mblen() returns -2 or -1 that character is skipped (as if it's whitespace). Regardless of locale, non-ASCII characters were returning -1 (invalid). My patch treats those characters as regular non-WS characters. This behavior seems to be consistent with non-ASCII handling in the middle of a CSV field.

Enclosing the CSV field data in a quote or the like works around the issue.

Hope this is clear and the correct protocol for submitting this patch. :)

Mike
 [2008-10-26 19:17 UTC] jani@php.net
Please try using this CVS snapshot:

  http://snaps.php.net/php5.2-latest.tar.gz
 
For Windows:

  http://windows.php.net/snapshots/


 [2008-11-03 01:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 [2012-10-10 11:53 UTC] erwin32 dot 64 at gmail dot com
try setlocale(LC_ALL, 'el_GR') instead setlocale(LC_ALL, 'gr_GR') greek language 
is el and state is GR
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Jan 15 13:01:29 2025 UTC