php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #31923 file_get_contents() does not accept urls containing hypen period "-."
Submitted: 2005-02-11 01:59 UTC Modified: 2005-02-15 21:35 UTC
From: mike-bugs dot php dot net at webheat dot co dot uk Assigned:
Status: Not a bug Package: Filesystem function related
PHP Version: 4.3.10 OS: Linux
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: mike-bugs dot php dot net at webheat dot co dot uk
New email:
PHP Version: OS:

 

 [2005-02-11 01:59 UTC] mike-bugs dot php dot net at webheat dot co dot uk
Description:
------------
A script I created ( http://webheat.co.uk/forumbuddy.php ) takes a username for the site DeviantArt.com which forms parts of the user's section of the site. The url takes the form of:
http://USERID.deviantart.com/

The script grabs one or more pages by using the file_get_contents() function.
Regardless of whether I use urlencode()/rawurlencode() and urldecode() the file_get_contents() function chokes with the error as detailed below.

This only happens when the userid ends in a hypen.
After some testing it seems to be that this is because the hyphen is then next to a period "-."

This hyphen period combination causes an error regardless of where it is placed in the user id.


This issue happens on my host's server but not on two other [Windows] machines I've tested this code on.

The hosting company have gone through their php.ini file on my behalf and whilst the file is not vanilla there are no differences that they can see would affect this fuction.

Reproduce code:
---------------
The relevant code is as follows:
file_get_contents("http://emdeeuk-.deviantart.com"); //Doesn't work

file_get_contents("http://emdee-.uk.deviantart.com"); //Doesn't work

file_get_contents("http://emdeeuk.deviantart.com"); //Does work


The full code can be viewed here:
http://webheat.co.uk/forumbuddytest.php.txt

Expected result:
----------------
<a href="http://www.deviantart.com/deviation/14935843/">Dogs on Kites</a>
<a href="http://www.deviantart.com/deviation/14935312/">Welcome to the Rat Race</a>
<a href="http://www.deviantart.com/deviation/14934730/">Arc de Bishopsgate</a>
<a href="http://www.deviantart.com/deviation/14592356/">PLEASE VOTE: New Webheat.co.uk</a>
<a href="http://www.deviantart.com/deviation/14586143/">Docklands Sunbathing</a>
<a href="http://www.deviantart.com/deviation/14556463/">Slide PSP8 Frame</a>

<a href="http://www.deviantart.com/deviation/14556193/">Whitechapel Horse</a>
<a href="http://www.deviantart.com/deviation/14521593/">Action Squirrel</a>
<a href="http://www.deviantart.com/deviation/14520802/">Beady Eye of the Plottin Drake</a>
<a href="http://www.deviantart.com/deviation/14485015/">Light House II</a>
<a href="http://www.deviantart.com/deviation/14377961/">Polaroid Photo Frames</a>
<a href="http://www.deviantart.com/deviation/14217085/">Now where's the Sphinx?</a>

<a href="http://www.deviantart.com/deviation/14213953/">Something in the Aer</a>
<a href="http://www.deviantart.com/deviation/14125157/">I Sea Gulls</a>
<a href="http://www.deviantart.com/deviation/14124660/">Coots really get my Goose</a>
<a href="http://www.deviantart.com/deviation/13932181/">Spoonful of sugar</a>
<a href="http://www.deviantart.com/deviation/13757002/">Ugly Emotion</a>
<a href="http://www.deviantart.com/deviation/13729717/">Light House</a>

<a href="http://www.deviantart.com/deviation/13729376/">Ilm Tree</a>
<a href="http://www.deviantart.com/deviation/13705596/">Only ugly on the inside</a>
<a href="http://www.deviantart.com/deviation/13639781/">Lava Lamponians</a>
<a href="http://www.deviantart.com/deviation/13597782/">Generations</a>
<a href="http://www.deviantart.com/deviation/9434166/">Flame Mobile Wallpaper</a>
<a href="http://www.deviantart.com/deviation/9421010/">Flame Body Work Wallpaper</a> 

Actual result:
--------------
Warning: file_get_contents(): php_network_getaddresses: getaddrinfo failed: Name or service not known in /home/webheat/public_html/forumbuddytest.php on line 6

Warning: file_get_contents(http://USER-.deviantart.com/gallery/?view=3&order=5&limit=24&offset=0): failed to open stream: Permission denied in /home/webheat/public_html/forumbuddytest.php on line 6

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2005-02-11 04:44 UTC] sniper@php.net
Those urls of yours don't work even in browser, why should they magically work in PHP?? (is - valid in DNS anyway?)


 [2005-02-11 08:53 UTC] mike-bugs dot php dot net at webheat dot co dot uk
The urls are as an example, http://emdeeuk.deviantart.com/ does work.
The url that identified the problem is: http://mark-.deviantart.com/
A hyphen is valid in DNS.
 [2005-02-11 21:36 UTC] pollita@php.net
Please refer to RFC833 Appendix 1.

Hyphens may not appear at the start or end of a <label>.

gethostbyname() (Used by the network streams code among other parts of PHP) treats this correctly according to RFC specification.  Your browser is simply more forgiving.
 [2005-02-12 00:02 UTC] mike-bugs dot php dot net at webheat dot co dot uk
If this is the case it does not explain why it works on two machines(installed by me)and not on two otheres (installed by my hosting company).

Btw, could you post a link to the RFD you are quoting from. I searched on Google and found an RFC entitled "Who talks TCP?"

Thanks
 [2005-02-12 06:21 UTC] magnus@php.net
It should be RFC883 (page 56) not RFC833.
http://www.faqs.org/rfcs/rfc883.html

"The labels must follow the rules for ARPANET host names.  They must start with a letter, end with a letter or digit, and have as interior characters only letters, digits, and hyphen."

 [2005-02-12 17:17 UTC] mike-bugs dot php dot net at webheat dot co dot uk
Thanks Magnus.

This does explain possible behaviour but doesn't explain why 2 Windows PHP installs work fine but 2 Linux PHP installs (as installed by my hosting company) do not.

This seems to imply that file_get_contents() doesn't validate the DNS compliance of the arguement.

Also referring to the previously note from Pollita, as forgiving as the browser maybe the DNS servers involved in the lookup of http://mark-.deviantart.com are the ones that are dealing with this more so from what I know.
The deviantart.com DNS server software also seems to be forgiving, not just of look ups of this kind but of creating new record of this kind too, as the DNS entry would have been created when the user mark- signed up his account with Deviantart.com
 [2005-02-12 17:52 UTC] pollita@php.net
"gethostbyname() (Used by the network streams code among other parts of PHP) treats this correctly according to RFC specification."

I can see this statement being inobvious.  I mean the gehostbyname() function defined by the host systems standard C library.  You assumption that PHP's network code does not attempt to validate the hostname before passing it on to the the system for resolution.  It works on some systems because on those systems the value is encoded into a DNS packet and sent to the DNS server unmangled.  On other systems, a validation check is performed first and if it doesn't pass, then it doesn't bother waiting for a network round trip, instead simply returning a failure.
 [2005-02-12 23:24 UTC] mike-bugs dot php dot net at webheat dot co dot uk
Thank you Pollita. That's a lot clearer.
Do you have any suggestions for a work around for this issue?
 [2005-02-13 02:21 UTC] pollita@php.net
I have an ugly solution, it's a very rudimentary HTTP/1.0 client which relies on the fact that deviantart.com uses a wildcard to direct *.deviantart.com to the same IP address.

It also relies on the fact that they're not doing any kind of 3xx redirection (at least as far as I could tell).  I wouldn't be surprised if you need to do some work on this function to make it work *well* for you.

function http_get_contents($url) {
  $resource = parse_url($url);

  $ip = gethostbyname('dummy.deviantart.com');
  $sock = fsockopen($ip, 80);
  if (!$sock) return false;

  fwrite($sock, "GET {$resource['path']}?{$resource['query']} HTTP/1.0\r\n");
  fwrite($sock, "Host: {$resource['host']}\r\n");
  fwrite($sock, "Connection: close\r\n\r\n");

  $response = fgets($sock, 8192);
  if (substr($response, 0, 3) != 200) return '';

  /* Skip headers */
  while (trim(fgets($sock, 8192)));

  $ret = '';
  while (!feof($sock)) {
    $ret .= fgets($sock, 8192);
  }

  return $ret;
}


That's PHP4/5 compliant.  If you're only worried about PHP5 compatability you can do:

$context = stream_context_create(array('http'=>array('proxy'=>'tcp://dummy.deviantart.com')));

$data = file_get_context($url, false, $context);

Which does *essentially* the same thing, but since it's a proper http wrapper takes care of the 3xx and other shortcommings of the userspace version I gave you above.
 [2005-02-13 02:23 UTC] pollita@php.net
Correction:
$context = stream_context_create(array('http'=>array('proxy'=>'tcp://dummy.deviantart.com:80')));

Forgot the port number :)
 [2005-02-15 21:35 UTC] mike-bugs dot php dot net at webheat dot co dot uk
Thank you for all your help, I'll try this at the weekend most likely.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri May 17 00:01:34 2024 UTC