php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #75244 fgetcsv reading failure with chinese characters
Submitted: 2017-09-22 07:37 UTC Modified: 2017-11-05 04:22 UTC
From: vitor dot luis98 at gmail dot com Assigned:
Status: No Feedback Package: Filesystem function related
PHP Version: 7.0.23 OS: Windows Server 2012 - Chinese
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please — but make sure to vote on the bug!
Your email address:
MUST BE VALID
Solve the problem:
40 - 21 = ?
Subscribe to this entry?

 
 [2017-09-22 07:37 UTC] vitor dot luis98 at gmail dot com
Description:
------------
When using Windows server 2012 version in Chinese language, the function fgetcsv() does not read the line correctly and some $delimiters are lost when splitting the line into Array.

Situation:

A CSV file with string translations, the string s are delimited by the character ";". This is an example of the File:

id;en;cs;ja;de;ar;bg;bs;da;es;et;fr;hr;hu;is;it;ko;lt;lv;no;pl;pt;pt-br;ru;sk;sl;sr;tr;zh-cn;zh-tw
FORBIDDEN_ACCESS;Access denied;Přístup zamítnut;アクセスが拒否されました;Zugriff verweigert;الدخول مرفوض;;Pristup nije odobren;Adgang afvist;Acceso denegado;;Accès refusé;Pristup odbijen;Hozzáférés megtagadva;;Accesso negato;액세스 거부됨;;;;Brak dostępu;Acesso negado;;В доступе отказано;Prístup zamietnutý;Dostop zavrnjen;Pristup zabranjen;Erişim Engellendi;访问被拒绝;存取被拒絕

Note: The empty columns are those who doesn't have a translation.

When I open this file using fopen only in read mode and use the function fgetcsv to parse line by line, some columns are not being splitted given to me the following output:

array(27) {
  [0]=>
  string(13) "Access denied"
  [1]=>
  string(19) "P艡铆stup zam铆tnut"
  [2]=>
  string(36) "銈偗銈汇偣銇屾嫆鍚︺仌銈屻伨銇椼仧
  [3]=>
  string(18) "Zugriff verweigert"
  [4]=>
  string(23) "丕賱丿禺賵賱 賲乇賮賵囟"
  [5]=>
  string(0) ""
  [6]=>
  string(20) "Pristup nije odobren"
  [7]=>
  string(13) "Adgang afvist"
  [8]=>
  string(15) "Acceso denegado"
  [9]=>
  string(0) ""
  [10]=>
  string(14) "Acc猫s refus茅"
  [11]=>
  string(15) "Pristup odbijen"
  [12]=>
  string(24) "Hozz谩f茅r茅s megtagadva"
  [13]=>
  string(0) ""
  [14]=>
  string(14) "Accesso negato"
  [15]=>
  string(20) "鞎§劯鞀?瓯半秬霅?"
  [16]=>
  string(0) ""
  [17]=>
  string(0) ""
  [18]=>
  string(13) "Brak dost臋pu"
  [19]=>
  string(13) "Acesso negado"
  [20]=>
  string(0) ""
  [21]=>
  string(34) "袙 写芯褋褌褍锌械 芯褌泻邪蟹邪薪芯"
  [22]=>
  string(20) "Pr铆stup zamietnut媒"
  [23]=>
  string(15) "Dostop zavrnjen"
  [24]=>
  string(17) "Pristup zabranjen"
  [25]=>
  string(18) "Eri艧im Engellendi"
  [26]=>
  string(32) "璁块棶琚嫆缁?瀛樺彇琚嫆绲?"
}


But this is only happening when the script is being ran on Windows with Chinese language, when is on English version, everything works perfectly.

The point is, when I use the function fgets() and then explode(), everything works fine.

Test script:
---------------
<?php

$file = fopen('dictionary.csv', 'r');
/** @var $langList array */
$langList = fgetcsv($file, 0, ";", "\n");
array_shift($langList);

while ($line = fgetcsv($file, 0, ";", "\n")) {
	$id = array_shift($line);
	echo "Langs = " . count($langList) . ", Line Size: ". count($line) . "\n";
	
	foreach ($langList as $i => $lang) {
		if (isset($line[$i]) === false) {
			var_dump($line);
			throw new Exception(sprintf('Malformed dictionary [%s] for id [lang = %s, index = %s].', $id, $lang, $i));
		}	
	}
}

Expected result:
----------------
array(30) {
  [0]=>
  string(16) "FORBIDDEN_ACCESS"
  [1]=>
  string(13) "Access denied"
  [2]=>
  string(19) "P艡铆stup zam铆tnut"
  [3]=>
  string(36) "銈偗銈汇偣銇屾嫆鍚︺仌銈屻伨銇椼仧
  [4]=>
  string(18) "Zugriff verweigert"
  [5]=>
  string(23) "丕賱丿禺賵賱 賲乇賮賵囟"
  [6]=>
  string(0) ""
  [7]=>
  string(20) "Pristup nije odobren"
  [8]=>
  string(13) "Adgang afvist"
  [9]=>
  string(15) "Acceso denegado"
  [10]=>
  string(0) ""
  [11]=>
  string(14) "Acc猫s refus茅"
  [12]=>
  string(15) "Pristup odbijen"
  [13]=>
  string(24) "Hozz谩f茅r茅s megtagadva"
  [14]=>
  string(0) ""
  [15]=>
  string(14) "Accesso negato"
  [16]=>
  string(19) "鞎§劯鞀?瓯半秬霅?
  [17]=>
  string(0) ""
  [18]=>
  string(0) ""
  [19]=>
  string(0) ""
  [20]=>
  string(13) "Brak dost臋pu"
  [21]=>
  string(13) "Acesso negado"
  [22]=>
  string(0) ""
  [23]=>
  string(34) "袙 写芯褋褌褍锌械 芯褌泻邪蟹邪薪芯"
  [24]=>
  string(20) "Pr铆stup zamietnut媒"
  [25]=>
  string(15) "Dostop zavrnjen"
  [26]=>
  string(17) "Pristup zabranjen"
  [27]=>
  string(18) "Eri艧im Engellendi"
  [28]=>
  string(15) "璁块棶琚嫆缁?
  [29]=>
  string(15) "瀛樺彇琚嫆绲?
}

Actual result:
--------------
array(27) {
  [0]=>
  string(13) "Access denied"
  [1]=>
  string(19) "P艡铆stup zam铆tnut"
  [2]=>
  string(36) "銈偗銈汇偣銇屾嫆鍚︺仌銈屻伨銇椼仧
  [3]=>
  string(18) "Zugriff verweigert"
  [4]=>
  string(23) "丕賱丿禺賵賱 賲乇賮賵囟"
  [5]=>
  string(0) ""
  [6]=>
  string(20) "Pristup nije odobren"
  [7]=>
  string(13) "Adgang afvist"
  [8]=>
  string(15) "Acceso denegado"
  [9]=>
  string(0) ""
  [10]=>
  string(14) "Acc猫s refus茅"
  [11]=>
  string(15) "Pristup odbijen"
  [12]=>
  string(24) "Hozz谩f茅r茅s megtagadva"
  [13]=>
  string(0) ""
  [14]=>
  string(14) "Accesso negato"
  [15]=>
  string(20) "鞎§劯鞀?瓯半秬霅?"
  [16]=>
  string(0) ""
  [17]=>
  string(0) ""
  [18]=>
  string(13) "Brak dost臋pu"
  [19]=>
  string(13) "Acesso negado"
  [20]=>
  string(0) ""
  [21]=>
  string(34) "袙 写芯褋褌褍锌械 芯褌泻邪蟹邪薪芯"
  [22]=>
  string(20) "Pr铆stup zamietnut媒"
  [23]=>
  string(15) "Dostop zavrnjen"
  [24]=>
  string(17) "Pristup zabranjen"
  [25]=>
  string(18) "Eri艧im Engellendi"
  [26]=>
  string(32) "璁块棶琚嫆缁?瀛樺彇琚嫆绲?"
}

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2017-09-25 14:14 UTC] jhdxr@php.net
-Status: Open +Status: Feedback
 [2017-09-25 14:14 UTC] jhdxr@php.net
Not enough information was provided for us to be able
to handle this bug. Please re-read the instructions at
http://bugs.php.net/how-to-report.php

If you can provide more information, feel free to add it
to this bug and change the status back to "Open".

Thank you for your interest in PHP.


What's your locale setting? And what's the encoding of the csv file?
 [2017-11-05 04:22 UTC] php-bugs at lists dot php dot net
No feedback was provided. The bug is being suspended because
we assume that you are no longer experiencing the problem.
If this is not the case and you are able to provide the
information that was requested earlier, please do so and
change the status of the bug back to "Re-Opened". Thank you.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Mar 29 05:01:28 2024 UTC