php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #31740 fgetcsv skips fields that start with an umlaut character
Submitted: 2005-01-28 12:52 UTC Modified: 2005-02-04 11:26 UTC
Votes:3
Avg. Score:5.0 ± 0.0
Reproduced:3 of 3 (100.0%)
Same Version:2 (66.7%)
Same OS:3 (100.0%)
From: arjan at avoid dot org Assigned:
Status: Closed Package: Documentation problem
PHP Version: 5.0.3 OS: Linux (Suse)
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: arjan at avoid dot org
New email:
PHP Version: OS:

 

 [2005-01-28 12:52 UTC] arjan at avoid dot org
Description:
------------
fgetcsv on PHP5.0.3 has problems with reading CSV-fields that start with an umlaut character (possibly other 'weird' characters as well). It simply skips those characters.

PHP4.3.10 works fine.

Reproduce code:
---------------
csv_test.php:

<?php
   $fp = fopen('csv_test.csv', 'r');
   while($data = fgetcsv($fp, 2000, ';', '"')) {
      var_dump($data);
   }
   fclose($fp);
?>

csv_test.csv:

language_name;country_name
Deutsch;?sterreich
Nederlands;Nederland
Deutsch;Deut?land
?nited Kingdom

Expected result:
----------------
array(2) {
  [0]=>
  string(13) "language_name"
  [1]=>
  string(12) "country_name"
}
array(2) {
  [0]=>
  string(7) "Deutsch"
  [1]=>
  string(9) "?sterreich"
}
array(2) {
  [0]=>
  string(10) "Nederlands"
  [1]=>
  string(9) "Nederland"
}
array(2) {
  [0]=>
  string(7) "Deutsch"
  [1]=>
  string(9) "Deut?land"
}
array(1) {
  [0]=>
  string(13) "?nited Kingdom"
}


Actual result:
--------------
array(2) {
  [0]=>
  string(13) "language_name"
  [1]=>
  string(12) "country_name"
}
array(2) {
  [0]=>
  string(7) "Deutsch"
  [1]=>
  string(9) "sterreich"
}
array(2) {
  [0]=>
  string(10) "Nederlands"
  [1]=>
  string(9) "Nederland"
}
array(2) {
  [0]=>
  string(7) "Deutsch"
  [1]=>
  string(9) "Deut?land"
}
array(1) {
  [0]=>
  string(13) "nited Kingdom"
}


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2005-01-28 19:37 UTC] arjan at avoid dot org
Tested the same script on another machine (Suse 9.0), with PHP5.0.3 installed: problem does not occur here.
We have two (identically installed) machines on which this bug does occur though.
Can anyone point us in some direction as to what might cause this peculiar behaviour?
 [2005-01-28 20:28 UTC] arjan at avoid dot org
In order to narrow the problem down as much as I can, I tried the following script as well on the system that have problems with fgetcsv:

<?php
   $fp = fopen('csv_test.csv', 'r');
   while (!feof($fp)) {
      $buffer = fgets($fp, 4096);
      echo $buffer;
   }
   fclose($fp);
?>

In this case, the umlauts do get read and printed.
 [2005-01-29 03:33 UTC] moriyoshi@php.net
What locale specifier is set to LANG or LC_CTYPE 
environment variable?


 [2005-01-29 12:07 UTC] arjan at avoid dot org
The LANG environment variable on the faulty machines was set like this:
   LANG="en_US.UTF-8"

When changed to
   LANG="en_US"

The problem is fixed. Thanks a lot!
However, shouldn't this behaviour be mentioned in the
manual for fgetcsv? I can imagine more people experiencing
this 'bug' that turns out to be not a bug...

Thanks again!
 [2005-01-29 23:22 UTC] derick@php.net
Yeah, we should have some information that tells people that the locale setting have effect on this, recategorizing...
 [2005-02-04 11:26 UTC] vrana@php.net
This bug has been fixed in the documentation's XML sources. Since the
online and downloadable versions of the documentation need some time
to get updated, we would like to ask you to be a bit patient.

Thank you for the report, and for helping us make our documentation better.

"Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function."

 [2011-01-30 15:29 UTC] max dot wildgrube at web dot de
Again: the 1st-umlaut-vasnishs problem: 
(PHP version 5.2.6)
Sorry, but for many of non-English web developers this "solution" is not helpful. As for those of us who are not a system administrator has the problem that we cannot influence the settings of PHP or Apache. So in my environment the "Safe mode" was switched on, preventing the usage of putenv. And I think it is an illusion to dispose some of the big providers to switch the safe mode off (even if this feature is DEPRECATED). 
And $_ENV ['LANG'] = 'en_US' does not healed the problem (nor setting de_DE). 
Nevertheless the environment variable LANG is not set (asking getenv, $_ENV, phpinfo).

And I think this problem has nothing to do with the encoding: Other “inline” umlauts are preserved as estimated.
If the data field is enclosed with quotes the 1st  Umlaut after the introducing quote (therefore the 2nd character) survives.

So I live now with the ugly workaround to place a magic sequence ~~ before every 1st umlaut in the csv file with:
preg_replace ("/(\t)([€-ÿ])/", "\t~~$2", $import);
and remove these sequences after fgetcsv while reading the field array with:
foreach ( $columns as $col ) {   $col = trim ($col, '~~'); …

Max.
 [2013-09-25 15:19 UTC] webspam at live dot de
In PHP 5.3.2 this problem is still there. No solution?
 [2013-09-25 15:27 UTC] gabri dot ns at gmail dot com
there is function called setlocale with which you can set
locale setting without changing server configuration.
has you try that?

http://www.php.net/setlocale
 [2013-10-26 18:32 UTC] max dot wildgrube at web dot de
I had a look on the change log for version 5 (http://www.php.net/ChangeLog-5.php):
there are 7 changes due to fgstcsv.
I just checked the fgetcsv function on 2 webspaces
One has php version 5.1: the problem is still there.
The other has version 5.5.17: no problem any more - no umlaut vanishes, even if it is the first non-blank character after the (TAB) delimiter.

So I have a hope for the future ;-)
But another thing seems very suspect to me: the discussion in the bug thread #31740 and #48507. 
Some users stated independently that there is an error, but the developpers says NO and CLOSE and ONLY DOCUMENTAION without really justification.
And then by some other (?) corrections the bug is vanished?

By the way: Is this
"
Note:
Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-
byte encoding are read wrong by this function. 
"
in the php documentation still current?
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Nov 23 14:01:29 2024 UTC