php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #69213 consecutivly issued fseek don't work properly when together over PHP_MAX_INT
Submitted: 2015-03-10 13:57 UTC Modified: 2015-03-12 10:28 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:0 of 0 (0.0%)
From: johnregphpbug at stefanie dot de Assigned:
Status: Not a bug Package: Filesystem function related
PHP Version: Irrelevant OS: Win 7 Home Premium SP1
Private report: No CVE-ID: None
 [2015-03-10 13:57 UTC] johnregphpbug at stefanie dot de
Description:
------------
(This is my first ever bug report on php.)
(PHP 5.6.3 doesn't show up on the drop down list on the bug report form, so I choose "irrelevant".)

I have Win 7, 64 bit. Latest XAMPP version, that has a 32 bit php 5.6.3. 

 On testing http://php.net/manual/en/function.fseek.php#112647 for a very big file (bigger than PHP_MAX_INT 2147483647) I'm now pretty sure, that the consecutivly following fseeks are summed up before really executed on the filepointer.
The result is that those fseeks only seek up to PHP_MAX_INT ahead from the current filepointer-position.

I found a workaround for that, as putting a fread between two originally consecutivly following fseeks seems to stop the aggregation. (Though that workaround needs a second small workaround, as fread can't be called with 0 bytes read length, so I have to fread one byte and put this in the calculation for breaking up the big number seek to smaller seeks.)

(Another bug: Once the filepointer is over PHP_MAX_INT then ftell will give back a negative number.) 



Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2015-03-10 14:40 UTC] ab@php.net
-Status: Open +Status: Not a bug
 [2015-03-10 14:40 UTC] ab@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

Thanks for the report.

PHP before 7 on Windows doesn't support large file operations. All the APIs used are 32 bit APIs. Same for integers. Und anyway much less in the 32 bit build, no matter what bitness the OS has.

Instead, it'd be really appreciable you to take some latest snapshot http://windows.php.net/downloads/snaps/master/ to test the functionality you've described. It is the highest time for that now :) Though note that you'll need a 64 bit build for LFS.

Thanks.
 [2015-03-10 15:39 UTC] johnregphpbug at stefanie dot de
I will try to install a 64 bit version. 

The manual is just not really clear at this. 
1. The answers on I got on stackoverflow can't be entirly correct, and other people have come accross the restriction, too: 
2. Reading over the PHP_MAX_INT treshhold is possible. But the workaround found at least in one user-comment on the manual of fseek and another old bugreport that shows a workaround for the workaround don't work as expected.

I'm very sure there is a bug in that php-version. Of course it might really just be undocumented behaviour outside the defintions. (But then it should throw an error; which it doesn't.)

(Can I just download a 64 bit build from the snapshots and put in my xampp installation instead of old php version?)

If you find time, you might want to test the workaround I tried:
<?php
$testdata = '';
// C:\xampp\todrivei\germany-latest.osm    42.651.795.559 , 42651795559 file size from last year
$filenameandpath = 'C:\xampp\todrivei\germany-latest.osm';
$seekto = 42651795559 - 2000;;
// PHP_INT_MAX: 2147483647;
function my_fseek($fp,$pos,$first=0) {
	if($first) fseek($fp,0,SEEK_SET);
	$pos=floatval($pos);
	// within limits, use normal fseek
	if($pos<=PHP_INT_MAX) {
		fseek($fp,$pos,SEEK_CUR);
	}
	// out of limits, use recursive fseek
	else {
		fseek($fp,PHP_INT_MAX,SEEK_CUR);
		$pos -= PHP_INT_MAX;
		$tempchar = fread($fp,1); // ."\n\r";
		$pos -= 1;
		my_fseek($fp,$pos);
	}
}
$testdata = 'PHP_INT_MAX: '.PHP_INT_MAX."\n\r";
$testdata .= 'seekto: ' . $seekto."\n\r";
$handle = fopen($filenameandpath,'r');
my_fseek($handle, floatval($seekto),1);
$testdata .= fread($handle,62000);
fclose($handle);
echo $testdata;
?>

I can even try to seek over the file size (like I set seekto to 400GByte) and fread still returns data of the file.
 [2015-03-11 07:58 UTC] ab@php.net
Hi, thanks for following. Just looking at your php code - it's horrible :) Even if it seems to work, I'd rather call it bug itself. In the C run time, the 32 bit offset is an unsigned long.

Tho manual is clearly states offset to expect an integer, not a double. Are you sure the data you seek() to is correct? Some unpleasant surprise is to expect, I'd highly discourage you from using this in production.

About how to try snap - the easiest is just to get a zipball and try on console. If you wish to use it with some webserver - IIS and Apache will sure work. For Apache, if your XAMPP is a vc11-x64 build - maybe yes, otherwise get an appropriate build from apachelounge.com, but in both cases use php7_module directive in the httpd.conf

Thanks.
 [2015-03-11 07:59 UTC] ab@php.net
Typo: in 32 bit runtime, offset is "signed long".
 [2015-03-12 02:41 UTC] johnregphpbug at stefanie dot de
Hi!
When I seek to the end (42+GByte) using perl it gives me the right end tag </osm>.
I just tried the php code again: I have two more counter-vars in my code here - the code looks even worse then :) ,  so I can be sure that the my_fseek function does count correctly, but doesn't give that end tag.

When I seek to PHP_INT_MAX (2147483647) less 1000 and then read 2000 bytes I get the same result as with perl.

When I seek to twice PHP_INT_MAX the my_fseek version already returns different text than the perl-script. There are node-tags with increasing id-numbers in the big file, and those are higher than at the PHP_INT_MAX position in the file, but already lower than in the (correct) perl output.

The output I see when I seek to 40GByte-position indicates that it comes from a position under PHP_INT_MAX: The id-numbers are lower than at PHP_INT_MAX.

As I only need to read such a big file only once a month or less, so I could preprocess it a bit and then put it in a database, I'm using perl now for that specific task. Switching everything (xampp, apache, php, mysql-interface) to 64 bit or to perl for only this would be too much.

If I would be better in programming, I could map which position of the file php gives back when seeking far over PHP_INT_MAX. I'm not that good. - But if I happen to find out I will post the results here.

Thank you very much for the feedback. I appreciate that very much.
Greetings John
 [2015-03-12 10:28 UTC] ab@php.net
Your experience with PERL seems to be ok, so you've found a practical solution. Though PERL seems to be using 64 bit APIs. Note that with 32 bit APIs there's no chance to process files bigger than INT_MAX bytes.

Btw. I wasn't asking you to switch your environment to master, but only to check whether the master x64 snap works correct on your snippet. The PHP code can be easily modified to use pure fseek() instead of my_fseek(), and running it on the console were good enough. If you had time, of course :)

Thanks.
 [2015-03-12 13:09 UTC] johnregphpbug at stefanie dot de
I will try it, when I get time. - The reasons I thought the function from the user-comment http://php.net/manual/en/function.fseek.php#112647 would work, are the upvotes on it and that others has posted similiar workarounds.

One Test I just made: It works up to PHP_MAX_INT + 8192 (2^13). 
(That way I got the first false positive. From than on it gives always the same result up to a bit over 2*PHP_MAX_INT. 
The second false impression I got when I tried much bigger then 2*PHP_MAX_INT, which gave other results again, but not the correct ones. So I thought I make some mistake somewhere.
The third wrong impression I got because when I seek to PHP_MAX_INT fread still works with for example reading 100,000 Bytes.)

So, it really only works to reliably up to PHP_MAX_INT

Thank you, too!
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Dec 26 15:01:32 2024 UTC