|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2007-10-24 18:00 UTC] harvie at email dot cz
Description:
------------
I have writed spider/crawler to make some web search engine as school project.
So... I have small problem:
I am using file_get_contents() (i've tryed fopen() too...).
Crawler works 100% great, but sometimes it freezing. I have tryed to trace what function freezes, and i found it, it's file_get_contents()...
So, i googled and found default_socket_timeout setting, i set it to 1, but sometimes its freezes and never get up again.
I've done this example, so you can see, that it freezes after few iterations. I have supplyed URL, that causes freeze of my crawler (im not sure why...):
Reproduce code:
---------------
#!/usr/bin/php
< ?php
/*Run and wait for a while, this can totaly stop the script at the dead point...*/
ini_set('default_socket_timeout',1);
set_time_limit(0);
//$url='http://ad.doubleclick.net/click';
$url='http://w.moreover.com/';
while(1) {
@file_get_contents($url, false, null, 0, 10000);
echo "#";
}
?>
Expected result:
----------------
I will download file from specified URL few times, and after that it will freeze and never be better...
(It works if you are using different url each time too, but it takes more time...)
Actual result:
--------------
harvie-ntb:/home/harvie/Desktop/crawler# ./bugshow.php
#1#2#3#4#5#6#7#8#9#10#11#12#13#14#15#16#17
And in there it freezes for eternity (i thought, that this will continue after 1 second if failed with ini_set('default_socket_timeout',1);, But whole script stops, i tryed to wait realy long long time...)
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Sat Oct 25 18:00:02 2025 UTC |
I have runned the script with strace debuger (This is debuging interpreter calls, not PHP code... of course.), if you are interested: # strace ./bugshow.php execve("./emails.php", ["./emails.php"], [/* 29 vars */]) = 0 uname({sys="Linux", node="harvie-ntb", ...}) = 0 brk(0) = 0x854c000 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fa8000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=67381, ...}) = 0 mmap2(NULL, 67381, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f97000 close(3) = 0 ...Lot of irrelevant stuff... connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.2.1")}, 28) = 0 fcntl64(3, F_GETFL) = 0x2 (flags O_RDWR) fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0 gettimeofday({1193256681, 718238}, NULL) = 0 poll([{fd=3, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1 send(3, "\6?\1\0\0\1\0\0\0\0\0\0\1w\10moreover\3com\0\0\1\0\1", 32, 0) = 32 poll([{fd=3, events=POLLIN, revents=POLLIN}], 1, 5000) = 1 ioctl(3, FIONREAD, [56]) = 0 recvfrom(3, "\6?\201\200\0\1\0\1\0\0\0\0\1w\10moreover\3com\0\0\1\0"..., 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.2.1")}, [16]) = 56 close(3) = 0 gettimeofday({1193256681, 768001}, NULL) = 0 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3 fcntl64(3, F_GETFL) = 0x2 (flags O_RDWR) fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0 connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("170.224.8.50")}, 16) = -1 EINPROGRESS (Operation now in progress) poll([{fd=3, events=POLLIN|POLLOUT|POLLERR|POLLHUP, revents=POLLOUT}], 1, 1000) = 1 getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0 fcntl64(3, F_SETFL, O_RDWR) = 0 send(3, "GET / HTTP/1.0\r\n", 16, 0) = 16 send(3, "Host: w.moreover.com\r\n", 22, 0) = 22 send(3, "\r\n", 2, 0) = 2 poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 0 poll([{fd=3, events=POLLIN|POLLERR|POLLHUP, revents=POLLIN}], 1, 1000) = 1 recv(3, "HTTP/1.1 200 OK\r\nDate: Wed, 24 O"..., 8192, 0) = 524 poll([{fd=3, events=POLLIN|POLLERR|POLLHUP, revents=POLLIN}], 1, 1000) = 1 recv(3, "s, online news, current awarenes"..., 8192, 0) = 524 poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 0 poll([{fd=3, events=POLLIN|POLLERR|POLLHUP}], 1, 1000) = 0 poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 0 poll([{fd=3, events=POLLIN|POLLERR|POLLHUP}], 1, 1000) = 0 poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 0 poll([{fd=3, events=POLLIN|POLLERR|POLLHUP}], 1, 1000) = 0 poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 0 poll([{fd=3, events=POLLIN|POLLERR|POLLHUP}], 1, 1000) = 0 poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 0 poll([{fd=3, events=POLLIN|POLLERR|POLLHUP}], 1, 1000) = 0 poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 0 ...This is repeating few times a second...It still happens. PHP 5.3.10 here. I have a script that keeps polling data from an online API every 10 seconds. After some days or normal function, it freezes at file_get_contents() even with the timeout parameter set to 10 in the http options. It often happens after/during connection problems with the server. protected function retrieveJSON($URL) { $opts = array('http' => array( 'method' => 'GET', 'timeout' => 10, ) ); $context = stream_context_create($opts); $feed = file_get_contents($URL, false, $context); <--- FREEZE $json = json_decode($feed, true); return $json; }I am facing the same bug under Linux 5.2.0-2-amd64 #1 SMP Debian 5.2.9-2 (2019-08-21) x86_64 GNU/Linux I am using php7.3.9 installed from my operating system repository (apt). My script hangs (cli) and reaches max execution time (fpm) when I execute file_get_contents("http://example.example/"); . The thing is that local files can be accessed and, more importantly, localhost can be accessed (accessed means it does not hang). But, because I use curl for Internet requests, I didn't notice this, until (from 7.2), I upgraded to 7.3. I was unable to use composer, because it uses file_get_contents. After a minute of querying it responded with a failed to open stream: Connection timed out . This sounds like a non-php related bug, but simce curl works, I am questioning this...