php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #77711 CURLFile should support UNICODE filenames
Submitted: 2019-03-08 16:00 UTC Modified: 2020-01-06 14:36 UTC
From: anrdaemon at freemail dot ru Assigned: cmb (profile)
Status: Closed Package: cURL related
PHP Version: 7.3.3 OS: Windows
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: anrdaemon at freemail dot ru
New email:
PHP Version: OS:

 

 [2019-03-08 16:00 UTC] anrdaemon at freemail dot ru
Description:
------------
CURLFile starting from PHP 7.1 is unable to handle local paths containing UNICODE sequences.
The workaround is to iconv() the path to single-byte character set used by WinAPI non-widestring family of functions on a given OS installation. But that's limiting the usability of the language to laughable levels.

$ php -i | grep -E "chars|encod"
default_charset => CP866 => CP866
input_encoding => CP866 => CP866
internal_encoding => UTF-8 => UTF-8
output_encoding => CP866 => CP866
zend.script_encoding => no value => no value
exif.encode_jis => no value => no value
exif.encode_unicode => ISO-8859-15 => ISO-8859-15
iconv.input_encoding => no value => no value
iconv.internal_encoding => no value => no value
iconv.output_encoding => no value => no value
mailparse.def_charset => UTF-8 => UTF-8
HTTP input encoding translation => disabled
mbstring.encoding_translation => Off => Off
mbstring.internal_encoding => no value => no value

Test script:
---------------
<?php

// Cyrillic capital letter A
// tempnam() returns path encoded in internal_encoding
$name = tempnam(__DIR__, "\u{0410}");
//print bin2hex(basename($name));
file_put_contents($name, "Test.");

$file = new \CURLFile($name, null, "file");
// $file = new \CURLFile(iconv("UTF-8", "CP1251", $name), null, "file"); // Rather arbitrary workaround

$curl = curl_init($argv[1] ?: "http://example.org/post.php");
curl_setopt_array($curl, [
  CURLOPT_POST => true,
  CURLOPT_POSTFIELDS => [
    "file" => $file,
  ],
  CURLOPT_VERBOSE => true,
]);

$rc = curl_exec($curl);
if($rc === false)
{
  $err = curl_errno($curl);
  error_log(curl_strerror($err) . " ($err): " . curl_error($curl));
  die;
}

print "$rc";


Expected result:
----------------
Successful upload.

Actual result:
--------------
Failed to open/read local data from file/application (26):

Patches

Pull Requests

Pull requests:

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2019-03-11 22:55 UTC] kalle@php.net
-Status: Open +Status: Assigned -Assigned To: +Assigned To: bagder
 [2019-03-11 22:55 UTC] kalle@php.net
From a quick read, it seems like the issue is caused in cURL itself, as internally we pass the filename supplied directly to curl_formadd().

I see two ways we can fix this:
1) Upstream fix from cURL
2) Implement a file name normalization fix for Windows, however I do not know how this will affect cURL

Assigning this to Daniel, perhaps he can share some insights on this.
 [2019-03-11 23:06 UTC] kalle@php.net
-Assigned To: bagder +Assigned To: ab
 [2019-03-11 23:06 UTC] kalle@php.net
@Anatol, can you confirm my theory? Daniel said on Twitter that he did not have the time to look into this issue, but told us to report it upstream if its an actual cURL issue
 [2019-03-12 00:21 UTC] anrdaemon at freemail dot ru
If you ask me, since PHP does filenames' charset conversion by itself (Win32 API does not support UTF-8 to the best of my knowledge), the solution should likely fall to cURL ext.
 [2019-03-12 11:04 UTC] ab@php.net
-Status: Assigned +Status: Open
 [2019-03-12 11:04 UTC] ab@php.net
It functionality in the description has been never present in PHP in the claimed form. libcurl still uses ANSI APIs which causes reports like https://github.com/curl/curl/issues/345. A solution needs to be done on the side cURL to

- always handle filenames as UTF-8, use Unicode APIs internally and do conversion, or
- export additional APIs based on wide chars, or
- export handlers for I/O operations, that can be overridden

The first option could be the easiest, but it would be a huge BC breach.

The second option might be more compatible, but it might lead to duplicating a lot of APIs.

The third option would regard to fopen, stat and several other I/O APIs involving filenames. Then any consuming library could make this part of cURL compatible to its needs. Perhaps that would be the cleanest way.

cURL could be also compiled with UNICODE, but i'm not sure it would really work. Fe see places like this https://github.com/curl/curl/blob/f762fec323f36fd7da7ad6eddfbbae940ec3229e/src/tool_filetime.c#L40 where ANSI function is used directly so would stay same even with UNICODE. And in any case PHP can't be compiled with UNICODE, so that's not an option.

@anrdaemon what merely has changed is, that you don't pass the filename in the ANSI codepage anymore, so the commented out iconv fix brings it to the correct state.

Thanks.
 [2019-03-12 20:46 UTC] anrdaemon at freemail dot ru
Understood, thanks for looking into the issue.
I'll bring it up with cURL, if you don't mind.
 [2019-03-13 09:27 UTC] kalle@php.net
-Status: Assigned +Status: Wont fix -Assigned To: ab +Assigned To:
 [2019-03-13 09:27 UTC] kalle@php.net
Thanks a lot Anatol for your detailed description, I personally like option 3 (which is similar to parts of libgd), but this is now all in cURL's hands and I don't think we should engineer a fix that could be potentially incompatible, but instead hope it can and will be fixed in cURL.

@anrdaemon If you report it upstream to cURL, then please link to this ticket. I will mark it as a "Won't Fix".

Thank you!
 [2019-03-13 21:13 UTC] anrdaemon at freemail dot ru
Yes, filed it as https://github.com/curl/curl/issues/3675
 [2019-03-15 13:56 UTC] anrdaemon at freemail dot ru
One of the libcurl developers expressed a concern[1] regarding API used by extension.
As I'm not a C/C++ guy myself, I'd like to invite interested parties to join the dicussion on github.

[1] https://github.com/curl/curl/issues/3675#issuecomment-473191793
 [2019-04-15 17:17 UTC] cmb@php.net
-Summary: CURLFile unable to handle UNICODE filenames +Summary: CURLFile should support UNICODE filenames -Status: Wont fix +Status: Re-Opened -Type: Bug +Type: Feature/Change Request -Assigned To: +Assigned To: cmb
 [2019-04-15 17:17 UTC] cmb@php.net
Re-opening as feature request, since it is doable with the
curl_mime_*() API[1].

[1] <https://github.com/php/php-src/pull/4032>
 [2019-04-16 08:43 UTC] cmb@php.net
The following pull request has been associated:

Patch Name: Extend CURLFile to support streams
On GitHub:  https://github.com/php/php-src/pull/4034
Patch:      https://github.com/php/php-src/pull/4034.patch
 [2019-04-29 08:25 UTC] cmb@php.net
-Status: Re-Opened +Status: Closed
 [2020-01-06 14:36 UTC] cmb@php.net
For several reasons, I had to revert the backport to PHP 7.3.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 12:01:29 2024 UTC