php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #75829 cURL and file_get_contents does not work with URL containing special character
Submitted: 2018-01-17 07:52 UTC Modified: 2018-01-17 08:35 UTC
Votes:1
Avg. Score:1.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: marco dot marsala at live dot it Assigned:
Status: Not a bug Package: URL related
PHP Version: 7.2.1 OS: both Ubuntu and Windows
Private report: No CVE-ID: None
 [2018-01-17 07:52 UTC] marco dot marsala at live dot it
Description:
------------
cURL and file_get_contents does not work with URL contanining special characters (for example: accented letters, quotes).

The URL sent to the remote server is badly encoded; special characters should be encoded as urlencode() would, like any modern web browser does.

Test script:
---------------
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://www.astegiudiziarie.it/vendita-asta-appartamento-genova-via-san-giovanni-d'acri-14-1360824");
echo curl_exec($ch); // got server error page

echo file_get_contents("https://www.astegiudiziarie.it/vendita-asta-appartamento-genova-via-san-giovanni-d'acri-14-1360824"); // got server error page

// my function to fix encoding
function urlencode_parts($url) {
    $parts = parse_url($url);
    $parts['path'] = implode('/', array_map('urlencode', explode('/', $parts['path'])));
    $url = new \http\Url($parts);
    return $url->toString();
}

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, urlencode_parts("https://www.astegiudiziarie.it/vendita-asta-appartamento-genova-via-san-giovanni-d'acri-14-1360824"));
echo curl_exec($ch); // now ok

echo file_get_contents(urlencode_parts("https://www.astegiudiziarie.it/vendita-asta-appartamento-genova-via-san-giovanni-d'acri-14-1360824")); // now ok

Expected result:
----------------
The script should download the same page you can see with a web browser.

Actual result:
--------------
The server replies with a server error page.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-01-17 07:59 UTC] marco dot marsala at live dot it
-Summary: cURL and file_get_contents does not work with URL contanining special character +Summary: cURL and file_get_contents does not work with URL containing special character
 [2018-01-17 07:59 UTC] marco dot marsala at live dot it
I don't think is a duplicate of #48550 but a much broader vision of the issue.
 [2018-01-17 08:00 UTC] spam2 at rhsoft dot net
https://en.wikipedia.org/wiki/Garbage_in,_garbage_out

PHP is a programming language and not a artificial intelligence hence you are responsible as programmer to encode input as needed and have such chars in a URL/filename is bad practice anyways
 [2018-01-17 08:09 UTC] marco dot marsala at live dot it
I agree with you that is a bad practice to have such characters in URL, but sometimes you have to interact with external websites that use such characters in their URLs (for example when writing a scraper).

Btw, the problem is on DOMDocument->loadHTMLFile() too
 [2018-01-17 08:15 UTC] spam2 at rhsoft dot net
the problem lis NOT in DOMDocument->loadHTMLFile() too, it is that you don't escape your input proper which is your job as programmer be it for SQL, URL's or whatever and the idea that this is a bug in a programming language is wrong from the start

frankly you even refer to https://bugs.php.net/bug.php?id=48550

[2009-06-15 17:51 UTC] jani@php.net
Well, browsers do a lot of thing automatically you need to do yourself 
when you do it yourself. This is obviously not any bug. (tested with 
curl on command line with same result..)
 [2018-01-17 08:35 UTC] requinix@php.net
-Status: Open +Status: Not a bug
 [2018-01-17 08:35 UTC] requinix@php.net
I get an error page if I try visiting that URL in a browser: "Nessuna vendita individuata" / No sale specified

Either your URL is wrong, or you need to think about string encodings (URLs are supposed to be UTF-8), or you need be using some sort of escaping on the apostrophe (percent-encoding?), or if you got that page via a form submission then you need to replay the form submission in code, or something else I don't know since I can't even find the right page through Google.
 [2018-05-09 10:02 UTC] marco dot marsala at live dot it
You're right, but it seems fixed with cUrl 7.52.1. The test script is working now.
 
PHP Copyright © 2001-2022 The PHP Group
All rights reserved.
Last updated: Thu Jan 27 21:03:35 2022 UTC