php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #13598 Arrays taking up to much memory
Submitted: 2001-10-08 10:15 UTC Modified: 2001-10-10 03:58 UTC
From: renze at datalink dot nl Assigned:
Status: Not a bug Package: Arrays related
PHP Version: 4.0CVS-2001-10-08 OS: Debian 2.2.17
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: renze at datalink dot nl
New email:
PHP Version: OS:

 

 [2001-10-08 10:15 UTC] renze at datalink dot nl
I'm trying to process a tab separated file. This file is about 1.5M large, contains 9040 lines that each contain 85 "columns".
Very happy with the fgetcsv() function I started working on this. Then, when I finished, I didn't get my wanted requested output, but only errors in my logfile. "memory exhausted".
So I started trying all kinds of stuff. The piece of script I use for the 'processing' is like:

if (!($fd = fopen ($file, "r"))) {
  // Some error and exit
}
$data = array();
$total = array();
while ($data = fgetcsv ($fd, 580, $delimiter)) {
  array_push ($total, $data);
}
if (!fclose ($fd)) {
  // Some error and exit
}

Now I've configured Apache to log the peak memory usage of the script and guess what?! 28M!!! It's take up 28M to read a file of hardly 1.5M!
Really weird. So I tried something else.

$contents = file ($file);

No problem! Memory usage: 2M.
So, I thought, maybe there's some bug in fgetcsv(). Let's try it myself:

$contents = file($file);
$total = array();
$record = array();
foreach ($contents as $line) {
  $record = split ("\t", $line);
  array_push ($total, $record);
}

Memory usage: 28M.
So that's not the solution. So... it might just be that array_push doesn't work correct. Next try:

if (!($fd = fopen ($file, "r"))) {
  // Some error and exit
}
$data = array();
$total = array();
$counter = 0;
while ($data = fgetcsv ($fd, 580, $delimiter)) {
  $total[$counter++] = $data;
}
if (!fclose ($fd)) {
  // Some error and exit
}

Memory usage: 24.7M
So that's no solution neither. Then I've tried the following change to the while-loop.

while ($total[] = fgetcsv ($fd, 580, $delimiter)) { }

No luck! Same memory usage. Okay... well... that one was predictable. So... Then I came up with the following very nasty solution:

if (!($fd = fopen ($file, "r"))) {
  // Some error and exit
}
$data = array();
$counter = 0;
while ($data = fgetcsv ($fd, 580, "r")) {
  $var_name = "myVar_$counter";
  $$var_name = $data;
  $counter++;
}
if (!close ($fd)) {
  // Some error and exit
}

Nasty ain't it?! Didn't work either. Memory usage: 24.8M. So quickly threw that one out again :)

I've tried several combinations of the above constructions. None worked correctly. Memory usage way to high!

Some other experiment:

if (!($fd = fopen ($file, "r"))) {
  // Some error and exit
}
$data = array();
$total = array();
while ($data = fgetcsv ($fd, 580, $delimiter)) {
  array_push ($total, implode ("\t", $data));
}
if (!fclose ($fd)) {
  // Some error and exit
}

Yep. That worked. Memory usage: 2M. But, well... it's the same result as just: $total = file ($file).
That's not the way!

So... well... I've tried about everything, but everything that produces the correct result (a 2D array with all the 'records' in it) also produces a memory usage that way to high. Anyone knows where this bug comes from?


*R&zE:


Btw... PHP version 4.0.8-dev:

--prefix=/usr/local/php
--with-config-file-path=/usr/local/php/etc
--with-exec-dir=/usr/local/php/safe
--with-apxs=/usr/local/Apache/bin/apxs
--without-mysql
--with-solid=/home/solid
--with-pgsql=/usr/local/pgsql
--with-pdflib=/usr/local
--with-db3
--enable-ftp
--with-mm
--with-zlib
--with-bz2
--with-openssl
--with-gd
--enable-gd-native-ttf
--with-jpeg-dir
--with-png-dir=/usr
--with-zlib-dir=/usr
--with-xpm-dir=/usr/X11R6
--with-ttf
--with-t1lib
--with-pcre-regex
--enable-sysvsem
--enable-sysvshm
--enable-memory-limit
--enable-inline-optimization
--enable-versioning

Apache version 1.3.14

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2001-10-08 16:12 UTC] jeroen@php.net
9040*84=768400. So you've got 759360 bytes of tabs, and about the same amount of data.

So each field in your file is probably about 1 byte large.  PHP doesn't support 2D arrays, as you can read in the manual, rather, it is possible to have an array as a value of another array, thus faking multidimensional arrays.

So, when you want a 2D array, you'll need 9040 array's, each containing 85 1-byte values. In any case, you need at least 768400 PHP variables for the strings themselves.

Dividing 22M by 768400, it turns out that about 28 bytes is used for each variable, and that's not extremely much (12 bytes for the variable itself, 8 bytes for the reference from the array, and 2 bytes for the data alone already makes 22 bytes...

PHP is a language meant to support a lot of different uses, which has it's drawback on memory. Usually this isn't noticable, but when you try to read 3/4 of a million single bytes in memory...

I don't know what the layout of your file is, and what you want to do with it, but I suggest to use a smarter approach to save the data in memory. Ask php-general for hints.
 [2001-10-09 03:53 UTC] renze at datalink dot nl
Okay... Jeroen thx for the time.
Just telling that PHP can't handle (multidimensional) arrays would have been enough though. Then you didn't have to make suggestions that aren't correct, like that I want to store 3/4 of a million single bytes. Ever heard about empty records ? :)
Also... when processing a tab separated file into an array, the tabs aren't stored!!
And I guess the smarter approach is meant to be a "smarter approach for PHP". Processing the file in memory is smart enough. It's a lot faster then processing it while reading and writing from/to disk. The records are a lot easier to address too. So I have to use a _different_ approach, not a smarter approach. But ofcourse, when PHP needs about 20M to much memory for a (small) file like this I'll have to process it one record after the other.
Btw: This bug isn't bogus... it's closed. PHP still uses 20M to much memory, so it's not all that bogus ;)

But anyway... Thx for the time you've put into it, telling me PHP can't handle the (multidim) arrays.
 [2001-10-09 15:48 UTC] jeroen@php.net
> Just telling that PHP can't handle (multidimensional)
> arrays would have been enough
> though.

That was not the problem here, it was the huge amount of records (an empty string takes 21 bytes, plus 1 byte per character). Storing all records in one singe array doesn't solve the problem.

With smarter I'm not referring to PHP, but to a way of handling this. Suggestion: Store each line as such (so, as a string) in one array, and only extract a requested record on-the-fly by means of a wrapper function when needed. Again, what's smart in your case highly depends on the application (there's nothing wrong with reading the whole file in memory, just with reading each record seperately in memory...)

And this is _not_ a bug, the semantics of PHP require such a high fixed-cost per variable. In java for example, this approach would require about the same amout of memory (didn't test, but counted number of members in String...). No language can read your mind, the programmer should be aware of the pro's and contra's of certain constructs, and choose the ones that suit the application.
 [2001-10-10 03:42 UTC] renze at datalink dot nl
Whatever! The fact that in some language these kinds of things aren't possible doesn't mean that if they were possible it would be the best way.

And let me make one thing real clear here:
I'm definitely not attacking you or PHP. If I'd think PHP wasn't any good, I wouldn't be using it in the first place, would I?! PHP rules big time! Really, it's excellent.

The fact that this construction uses so much memory is indeed not a bug (sorry if it looked like I said that... it wasn't what I meant). It's a choice that's made. For one language one chooses to support this and that makes it almost impossible to do some other stuff. For PHP the choice was made to not support this in detail. So for this construction it uses to much memory, but it opens opportunities to do a whole lot of other things. It not a mistake, it's not a bug, it's a _choice_. And a good one as I see it.

Thing that I meant in my previous mail wasn't that this is a bug, but that it isn't bogus either. The thing I brought up is still there isn't it?!? So can not call this a bogus report. A bogus report is due to only wrong programming. The bogus status is to show that the report doesn't have any (good) meaning. In this case, 'closed' would be better. Anyone else who comes across this problem can just read this and know they'll have to use a different construction. When it's status is 'bogus', people will skip over this one and still not no where their problem comes from. Chances are very big you'll keep on getting questions like this. And people will have to spend way to much time trying to solve their problem. 'Bogus' is meant to say not to pay any attention to the report.
 [2001-10-10 03:58 UTC] jmoore@php.net
A note in the manual might be more appropriate mentioning this, I feel this bug report should remain bogus, there was no problem with PHP and nothing was changed so bogus is the right classification. Closed implies it was fixed.

- James
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Apr 16 08:01:32 2024 UTC