php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #75829 cURL and file_get_contents does not work with URL containing special character
Submitted: 2018-01-17 07:52 UTC Modified: 2018-01-17 08:35 UTC
Votes:1
Avg. Score:1.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: marco dot marsala at live dot it Assigned:
Status: Not a bug Package: URL related
PHP Version: 7.2.1 OS: both Ubuntu and Windows
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: marco dot marsala at live dot it
New email:
PHP Version: OS:

 

 [2018-01-17 07:52 UTC] marco dot marsala at live dot it
Description:
------------
cURL and file_get_contents does not work with URL contanining special characters (for example: accented letters, quotes).

The URL sent to the remote server is badly encoded; special characters should be encoded as urlencode() would, like any modern web browser does.

Test script:
---------------
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://www.astegiudiziarie.it/vendita-asta-appartamento-genova-via-san-giovanni-d'acri-14-1360824");
echo curl_exec($ch); // got server error page

echo file_get_contents("https://www.astegiudiziarie.it/vendita-asta-appartamento-genova-via-san-giovanni-d'acri-14-1360824"); // got server error page

// my function to fix encoding
function urlencode_parts($url) {
    $parts = parse_url($url);
    $parts['path'] = implode('/', array_map('urlencode', explode('/', $parts['path'])));
    $url = new \http\Url($parts);
    return $url->toString();
}

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, urlencode_parts("https://www.astegiudiziarie.it/vendita-asta-appartamento-genova-via-san-giovanni-d'acri-14-1360824"));
echo curl_exec($ch); // now ok

echo file_get_contents(urlencode_parts("https://www.astegiudiziarie.it/vendita-asta-appartamento-genova-via-san-giovanni-d'acri-14-1360824")); // now ok

Expected result:
----------------
The script should download the same page you can see with a web browser.

Actual result:
--------------
The server replies with a server error page.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-01-17 07:59 UTC] marco dot marsala at live dot it
-Summary: cURL and file_get_contents does not work with URL contanining special character +Summary: cURL and file_get_contents does not work with URL containing special character
 [2018-01-17 07:59 UTC] marco dot marsala at live dot it
I don't think is a duplicate of #48550 but a much broader vision of the issue.
 [2018-01-17 08:00 UTC] spam2 at rhsoft dot net
https://en.wikipedia.org/wiki/Garbage_in,_garbage_out

PHP is a programming language and not a artificial intelligence hence you are responsible as programmer to encode input as needed and have such chars in a URL/filename is bad practice anyways
 [2018-01-17 08:09 UTC] marco dot marsala at live dot it
I agree with you that is a bad practice to have such characters in URL, but sometimes you have to interact with external websites that use such characters in their URLs (for example when writing a scraper).

Btw, the problem is on DOMDocument->loadHTMLFile() too
 [2018-01-17 08:15 UTC] spam2 at rhsoft dot net
the problem lis NOT in DOMDocument->loadHTMLFile() too, it is that you don't escape your input proper which is your job as programmer be it for SQL, URL's or whatever and the idea that this is a bug in a programming language is wrong from the start

frankly you even refer to https://bugs.php.net/bug.php?id=48550

[2009-06-15 17:51 UTC] jani@php.net
Well, browsers do a lot of thing automatically you need to do yourself 
when you do it yourself. This is obviously not any bug. (tested with 
curl on command line with same result..)
 [2018-01-17 08:35 UTC] requinix@php.net
-Status: Open +Status: Not a bug
 [2018-01-17 08:35 UTC] requinix@php.net
I get an error page if I try visiting that URL in a browser: "Nessuna vendita individuata" / No sale specified

Either your URL is wrong, or you need to think about string encodings (URLs are supposed to be UTF-8), or you need be using some sort of escaping on the apostrophe (percent-encoding?), or if you got that page via a form submission then you need to replay the form submission in code, or something else I don't know since I can't even find the right page through Google.
 [2018-05-09 10:02 UTC] marco dot marsala at live dot it
You're right, but it seems fixed with cUrl 7.52.1. The test script is working now.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Dec 22 11:01:30 2024 UTC