php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #37005 munged JPEG + imagecreatefromstring segfault
Submitted: 2006-04-07 05:28 UTC Modified: 2006-04-11 20:11 UTC
From: ceo at l-i-e dot com Assigned: pajoye (profile)
Status: Not a bug Package: GD related
PHP Version: 5.1.2 OS: FreeBSD o11.hostbaby.com 5.3-REL
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: ceo at l-i-e dot com
New email:
PHP Version: OS:

 

 [2006-04-07 05:28 UTC] ceo at l-i-e dot com
Description:
------------
This *MAY* be the same as the bug fixed by the new GD functions, what with GD free-ing RAM that PHP had allocated, but I suspect it is not...

Suppose a user is naive enough to not use CURLOPT_BINARYTRANSFER when using curl to get an image.

Suppose they then pass that image string into imagecreatefromstring()

Then that user will get a segfault, most of the time.

Though not always.

Granted, this is pretty dumb thing to do, once you understand what CURLOPT_BINARYTRANSFER is for in the first place.

But, before you grok that, it's a pretty common mistake.

Or, even if you understood it, but somehow mis-coded, or forgot it the next time you wrote some similar code, you end up with segfaults.

And common mistakes, in an ideal world, should not segfault, but should produce an E_ERROR (or similar).


Reproduce code:
---------------
The original code was your basic:
[Untested, really, but...]
<?php
  $curl = curl_init();
  curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
  curl_setopt($curl, CURLOPT_URL, 'http://bugs.php.net/gifs/logo-bug.gif');
  $image_string = curl_exec($curl);
  $image = imagecreatefromstring($image_string);
?>

It may be JPEG only, so you'd need to try different images instead of the self-referenctial GIF from this page.

However, you may find it easier to just snag this jpeg:
http://acousticdemo.com/info.com/overture/jpeg_crashed/088b1cc1662339d5008fd3d67ec7cf01.jpg
which I saved from the above, and work with it.

If you simply snag that, and do:
<?php imagecreatefromstring($YOURFILE);?>
it will segfault.

I can do it from the command line every time with that file.

Yes, it *IS* a corrupt JPEG, almost for sure.

But I'm hoping it's corrupt in a detectable way, if you know what I mean, and we can change the 'segfault' behaviour into E_ERROR behaviour.

Mozilla seems quite content to display a rendering which "looks right" even for that corrupt image, which gives me hope that it is a detectable error -- but also makes debugging quite difficult, since you have an image that "looks right" but that PHP segfaults on, every time.

Here are some more sample (corrupt) images, for your convenience for testing:
http://acousticdemo.com/info.com/overture/jpeg_crashed/


Expected result:
----------------
I expected E_ERROR for an invalid JPEG.

I don't expect it to "fix" the image the way the browsers do, though.

The scripter should be educated via E_ERROR to fix their code, rather than have PHP fix it for, say, JPEGs, but then it fails for any custom proprietary binary data.


Actual result:
--------------
segfault


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-04-07 08:34 UTC] tony2001@php.net
Please try using this CVS snapshot:

  http://snaps.php.net/php5.1-latest.tar.gz
 
For Windows:
 
  http://snaps.php.net/win32/php5.1-win32-latest.zip


 [2006-04-07 12:41 UTC] pajoye@php.net
If the segfault still occurs using the CVS snapshot, please provide a script with one or many images to reproduce the segfauls. CURL or any other things are not GD related and will not be considered in this bug.

However I tried your images and it is correctly proceeded:
Processing: 7512e7d01fc3894e106ca66db2d36064.jpg

Warning: imagecreatefromstring(): Data is not in a recognized format in /home/pierre/projects/php/37005/37005.php on line ..


 [2006-04-07 18:22 UTC] ceo at l-i-e dot com
Progress.

Using the snapshot and its bundled GD.

I can now create the JPEG from the corrupted file, which is way more than I expected, honestly.

I can also do imagejpeg($image, 'cvs1.jpg') and get it out.

But, then, as soon as I ^D to get out of the interpreter, it core-dumped on me...

This was repeatable, twice in a row.

You can find the images, the cvs-generated images, and the core dumps here:
http://www.acousticdemo.com/info.com/overture/jpeg_crashed/

I dunno if core dumps can be used across machines, but I could maybe stumble my way through a backtrace thingie, if it would help...
 [2006-04-07 18:27 UTC] ceo at l-i-e dot com
I was using this file as my test input:
088b1cc1662339d5008fd3d67ec7cf01.jpg

Note that it DOES output the image okay now, or, at least, it "looks" okay in a browser, but a core dump happens when you try to exit PHP with EOF (^D).

Sample code:
<?php $image = imagecreatefromjpeg('088b1cc1662339d5008fd3d67ec7cf01.jpg');?>
<?php imagejpeg($image, 'cvs2.jpg');?>

Using 5.1.2 (not snapshot) on Windows, I got messages similar to what you see.
C: php -v
PHP 5.1.2 (cli) (built: Jan 11 2006 16:40:00)

This MAY be specific to FreeBSD, then...
 [2006-04-07 18:35 UTC] ceo at l-i-e dot com
Yes, in fact, I *CAN* stumble my way through a backtrace.

This GDB was configured as "i386-marcel-freebsd"...
Core was generated by `php'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libcrypt.so.2...done.
Loaded symbols for /lib/libcrypt.so.2
Reading symbols from /usr/local/lib/libpng.so.5...done.
Loaded symbols for /usr/local/lib/libpng.so.5
Reading symbols from /lib/libz.so.2...done.
Loaded symbols for /lib/libz.so.2
Reading symbols from /usr/local/lib/libjpeg.so.9...done.
Loaded symbols for /usr/local/lib/libjpeg.so.9
Reading symbols from /lib/libm.so.3...done.
Loaded symbols for /lib/libm.so.3
Reading symbols from /usr/local/lib/libxml2.so.5...done.
Loaded symbols for /usr/local/lib/libxml2.so.5
Reading symbols from /usr/local/lib/libiconv.so.3...done.
Loaded symbols for /usr/local/lib/libiconv.so.3
Reading symbols from /lib/libc.so.5...done.
Loaded symbols for /lib/libc.so.5
Reading symbols from /libexec/ld-elf.so.1...done.
Loaded symbols for /libexec/ld-elf.so.1
#0  0x00000000 in ?? ()
(gdb) bt
#0  0x00000000 in ?? ()
#1  0x08164cd1 in execute (op_array=0x832700c) at zend_vm_execute.h:92
#2  0x0814d652 in zend_execute_scripts (type=8, retval=0x0, file_count=3) at /www/acousticdemo.com/php_cvs/php5.1-200604071630/Zend/zend.c:1109
#3  0x081199c4 in php_execute_script (primary_file=0xbfbfeb80) at /www/acousticdemo.com/php_cvs/php5.1-200604071630/main/main.c:1728
#4  0x081c621b in main (argc=2, argv=0xbfbfec10) at /www/acousticdemo.com/php_cvs/php5.1-200604071630/sapi/cgi/cgi_main.c:1603
 [2006-04-07 18:44 UTC] pajoye@php.net
Sorry, but it is confusing.

Give this snippet:
$image = imagecreatefromjpeg('088b1cc1662339d5008fd3d67ec7cf01.jpg');
imagejpeg($image);

Did you try it on freebsd using the PHP sources and not the BSD ports?

The backtrace has nothing to do with GD or the image functions. Please confirm that you use the PHP sources and the bundled GD and not the BSD ports.

 [2006-04-08 21:52 UTC] ceo at l-i-e dot com
Yes, I am using the PHP CGI I personally compiled.

/www/acousticdemo.com/php_cvs/php5.1-200604071630/usr/local/bin/php -a
is what I did to test

/www/acousticdemo.com/php_cvs/php5.1-200604071630/usr/local/bin/php  -v yields:
PHP 5.1.3RC3 (cgi) (built: Apr  7 2006 19:23:50) (DEBUG)

-m shows:
curl
date
gd
libxml
pcre
Reflection
SPL
standard
zlib

-i | grep GD yields:
GD Support enabled
GD Version bundled (2.0.28 compatible)

I can call imagecreatefromjpeg() and imagejpeg() but it segfaults when I hit ^D to exit the script.

I have also tested 5.1.2 (from php.net/downloads.php) and it has the same issues.

My webhost has 5.0.4 installed from the ports, so there's no way I'm running his when it say 5.1.3RC3 for -v
 [2006-04-11 20:07 UTC] ceo at l-i-e dot com
On Tue, April 11, 2006 7:45 am, Pierre wrote:
> On Mon, 10 Apr 2006 20:20:04 -0500 (CDT)
> "Richard Lynch" <ceo@l-i-e.com> wrote:
> 
>> Just FYI:
>>
>> Some progress...
>>
>> A snap from today (php5.1-200604102030) doesn't segfault, per se...
>>
>> But it ends up doing this sometimes, in my monster script:
>> /www/acousticdemo.com/php_cvs/php5.1-200604102030/Zend/zend_hash.c(752)
>> : ht=0x83e20a4 is being cleaned
>>
>> I also get a lot of "string not zero-terminated" warnings, but that
>> may be expected when one calls stristr or file_put_contents on a
>> binary string (a JPEG string suitable input for
>> imagecreatefromstring)
>>
>> If I'm reading PHP source code correctly, that "ht %p is begin
>> cleaned" is followed immediately by a call to zend_bailout()
>>
>> If that means what I think it does, it might explain why my script
>> is
>> just sort of ending without much else in the way of error
>> messages...
>>
>> I know we're probably now long out of the GD section of PHP, so I'm
>> trying through PHP-Dev for now...
> 
> Keep it in a bug report is actually the best way.

I'll be cc-ing this into the bug report, then.

>> But wanted to let you know things are not quite copascetic, in case
>> it
>> really was something in the guts of GD scrambling memory?
>>
>> The acousticdemo.com username/password remain available
>> indefinitely,
>> if it's useful.
>>
>> But I can only generate this problem now with a rather large
>> script :-(
> 
> Which one? The problem in overture.php?

Yes.

It it now reaches a point where it (usually) prints out:
ht=0x012345678 is being clean

and then it just sort of ends, with no error message at all.

>> My previous example with php -a and the sample (corrupt) image no
>> longer crashes at script end.
> 
> So this problem is solved. Or d

Truthfully, it seems more like it's been "moved" to a different section of code, at least to me...

>> But my real script doesn't work any better, really.
>>
>> cd web/info.com/overture/
>> ~/php_cvs/php5.1-200604102030/sapi/cli/php overture.php > out.txt &
>>
>> Data used in the script is stored in pages/ for review if that helps
>> any.
>>
>> Attempts so far to reproduce the bug now with small scripts have
>> failed miserably. :-(
> 
> What does not work?

It code simply ends without running to completion, with no error messages, or with the "ht %p being cleaned" error message, which, in PHP CVS source, is immediately followed by a call to 'zend_bailout()'

That looks bad to this naive reader. :-)

> What is this exec call at the top of your script?

That's just a hack to not run the script 2 times at once, if it runs very long time and then cron calls it again.

Unless you are testing 2 processes of the script in parallel (don't do that) you can just ignore it.  Or I'll rip it out, since I should use a lock file, really.

> What does exactly this script? A short explanation should help to
> catch
> the problem.

curl reads a page that has a CAPTCHA on it
curl reads the JPEG with CURLOPT_BINARYTRANSFER (right?)
OCR the CAPTCHA:
  imagecreatefromstring()
  de-noise
  down-sample to black/white
  convert to ASCII art pixel-by-pixel imagecolorat
  compute min distance from known characters in 'dictionary'
  (known characters are also ASCII art)
POST back the guess of the CAPTCHA

It's pretty much an anti-CAPTCHA hack...
 [2006-04-11 20:11 UTC] pajoye@php.net
This is not a GD bug. Keep support questions in other places.

Other bugs should be reported separatelly.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Jun 01 11:01:33 2024 UTC