|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2006-11-01 13:55 UTC] herbert dot fischer at gmail dot com
Description:
------------
PHP is assuming character encoding from external executed svn client, as ASCII.
Even when external program returns ISO-8859-1 encoded string, PHP "parses" the encoded string as ASCII, expanding accented characters as literal string form and not their binary form.
For example:
an output like "Acentua??o" turns to be a string in literal form "Acentua?\195?\167?\195?\163o/".
Reproduce code:
---------------
Import some accented file or folders into a subversion repository. Is it possible to convert the output to utf-8 using the command bellow:
# svn list 'file:////home/svn/herbert/' | iconv -tutf-8
But not when from PHP:
<?php
$cmd = "svn list 'file:////home/svn/herbert/'";
$out = shell_exec($cmd);
$res = unpack('c*', $out);
var_dump($res);
?>
var_dump reports:
array(29) {
[1]=>
int(65)
[2]=>
int(99)
[3]=>
int(101)
[4]=>
int(110)
[5]=>
int(116)
[6]=>
int(117)
[7]=>
int(97)
[8]=>
int(63)
[9]=>
int(92)
[10]=>
int(49)
[11]=>
int(57)
[12]=>
int(53)
[13]=>
int(63)
[14]=>
int(92)
[15]=>
int(49)
[16]=>
int(54)
[17]=>
int(55)
[18]=>
int(63)
[19]=>
int(92)
[20]=>
int(49)
[21]=>
int(57)
[22]=>
int(53)
[23]=>
int(63)
[24]=>
int(92)
[25]=>
int(49)
[26]=>
int(54)
[27]=>
int(51)
[28]=>
int(111)
[29]=>
int(47)
}
So it's not possible to convert the string to other character set, since it's invalid.
Expected result:
----------------
It's expected to PHP store the string as it's original binary format.
array(10) {
[1]=>
int(65)
[2]=>
int(99)
[3]=>
int(101)
[4]=>
int(110)
[5]=>
int(116)
[6]=>
int(117)
[7]=>
int(97)
[8]=>
int(-25)
[9]=>
int(-29)
[10]=>
int(111)
}
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Sat Oct 25 02:00:01 2025 UTC |
I solved the problem by exporting the variable LANG=en_US.utf8 (or some other charset). I did it in each shell command in my php script, but probably it can be done in the shell settings of the user, who is the owner of the php file (e.g. www-data). Example: shell_exec("LANG=en_US.utf8; svn list file:///path/to/repos"); German chars like ?, ?, ? or ? (which used to be some ?\xyz code) are correct now.