php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #39332 Wrong character encoding from external program output
Submitted: 2006-11-01 13:55 UTC Modified: 2006-11-16 01:00 UTC
Votes:4
Avg. Score:4.2 ± 0.8
Reproduced:4 of 4 (100.0%)
Same Version:1 (25.0%)
Same OS:2 (50.0%)
From: herbert dot fischer at gmail dot com Assigned:
Status: No Feedback Package: Program Execution
PHP Version: 5.1.6 OS: Red Hat ELAS4 Upd3
Private report: No CVE-ID: None
 [2006-11-01 13:55 UTC] herbert dot fischer at gmail dot com
Description:
------------
PHP is assuming character encoding from external executed svn client, as ASCII.

Even when external program returns ISO-8859-1 encoded string, PHP "parses" the encoded string as ASCII, expanding accented characters as literal string form and not their binary form.

For example: 
an output like "Acentua??o" turns to be a string in literal form "Acentua?\195?\167?\195?\163o/".

Reproduce code:
---------------
Import some accented file or folders into a subversion repository. Is it possible to convert the output to utf-8 using the command bellow:

# svn list 'file:////home/svn/herbert/' | iconv -tutf-8

But not when from PHP:

<?php
$cmd = "svn list 'file:////home/svn/herbert/'";
$out = shell_exec($cmd);
$res = unpack('c*', $out);
var_dump($res);
?>

var_dump reports:

array(29) {
  [1]=>
  int(65)
  [2]=>
  int(99)
  [3]=>
  int(101)
  [4]=>
  int(110)
  [5]=>
  int(116)
  [6]=>
  int(117)
  [7]=>
  int(97)
  [8]=>
  int(63)
  [9]=>
  int(92)
  [10]=>
  int(49)
  [11]=>
  int(57)
  [12]=>
  int(53)
  [13]=>
  int(63)
  [14]=>
  int(92)
  [15]=>
  int(49)
  [16]=>
  int(54)
  [17]=>
  int(55)
  [18]=>
  int(63)
  [19]=>
  int(92)
  [20]=>
  int(49)
  [21]=>
  int(57)
  [22]=>
  int(53)
  [23]=>
  int(63)
  [24]=>
  int(92)
  [25]=>
  int(49)
  [26]=>
  int(54)
  [27]=>
  int(51)
  [28]=>
  int(111)
  [29]=>
  int(47)
}

So it's not possible to convert the string to other character set, since it's invalid.

Expected result:
----------------
It's expected to PHP store the string as it's original binary format.

array(10) {
  [1]=>
  int(65)
  [2]=>
  int(99)
  [3]=>
  int(101)
  [4]=>
  int(110)
  [5]=>
  int(116)
  [6]=>
  int(117)
  [7]=>
  int(97)
  [8]=>
  int(-25)
  [9]=>
  int(-29)
  [10]=>
  int(111)
}


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-11-08 14:14 UTC] tony2001@php.net
Thank you for this bug report. To properly diagnose the problem, we
need a short but complete example script to be able to reproduce
this bug ourselves. 

A proper reproducing script starts with <?php and ends with ?>,
is max. 10-20 lines long and does not require any external 
resources such as databases, etc. If the script requires a 
database to demonstrate the issue, please make sure it creates 
all necessary tables, stored procedures etc.

Please avoid embedding huge scripts into the report.


 [2006-11-16 01:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 [2008-08-14 12:51 UTC] sebastianstenzel at googlemail dot com
I solved the problem by exporting the variable LANG=en_US.utf8 (or some other charset).
I did it in each shell command in my php script, but probably it can be done in the shell settings of the user, who is the owner of the php file (e.g. www-data).

Example:
shell_exec("LANG=en_US.utf8; svn list file:///path/to/repos");

German chars like ?, ?, ? or ? (which used to be some ?\xyz code) are correct now.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Sep 20 07:01:27 2024 UTC