php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #78280 _emalloc causes segfaults
Submitted: 2019-07-12 15:43 UTC Modified: 2020-02-26 19:31 UTC
Votes:2
Avg. Score:5.0 ± 0.0
Reproduced:2 of 2 (100.0%)
Same Version:2 (100.0%)
Same OS:2 (100.0%)
From: grzegorz129 at gmail dot com Assigned: cmb (profile)
Status: Duplicate Package: Reproducible crash
PHP Version: 7.3.7 OS: Linux
Private report: No CVE-ID: None
 [2019-07-12 15:43 UTC] grzegorz129 at gmail dot com
Description:
------------
From the start I have to say I don't have a reproducer which I can share, since problem only demonstrates itself in a large codebase. However, I'm able to consistently reproduce the problem and I was able to gather couple core dumps. 


- https://pastebin.com/AD7Gtv4k
- https://pastebin.com/iG9djuPi
- https://pastebin.com/yMNHsgF2
- https://pastebin.com/PEyBwdry
- https://pastebin.com/bBjPaAEy

The only consistent element I see is the top frames on the chain: _emalloc -> zend_mm_alloc_heap -> zend_mm_alloc_small

Test script:
---------------
Not available

Expected result:
----------------
-

Actual result:
--------------
Segfault

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2019-07-12 15:47 UTC] nikic@php.net
Where did you get this 7.3.7 release? It was retagged due to a critical opcache bug, and I suspect your build does not include https://github.com/php/php-src/commit/21465ec0e1c1401751b35a21f45f1d57255d5be9.
 [2019-07-12 15:58 UTC] grzegorz129 at gmail dot com
And another ones: https://pastebin.com/06fsMevR

I got my 7.3.7 yesterday from github tgz release. I double checked and the path is in the source I compiled. I can try compiling latest master if that helps?
 [2019-07-12 16:04 UTC] nikic@php.net
Thanks for checking, must be a different problem then. Is this a regression from a recent release? Does this occur under opcache?

If this is code callable from CLI, can you run PHP using "USE_ZEND_ALLOC=0 valgrind php script.php" and provide the resulting log?
 [2019-07-12 16:08 UTC] grzegorz129 at gmail dot com
Some clarifications:
 - I ruled out hardware problem - this is reproducible on 3 AWS instances as well as 2 physical machines (running Ubuntu on VMWare with Windows host)
 - System is not pressured into OOM (~40/60GB of physical memory free). The PHP process has memory_limit of 2GB of which it uses ~1.2-1.3GB
 - The zbacktrace is almost always different while running the code. Originally the error presented itself while throwing an exception, now after couple code changes it usually either happens during Doctrine hydration but also in other places
 - I tried both with gc enabled as well as gc_disable + gc_collect between big loops - no change
 [2019-07-12 16:48 UTC] grzegorz129 at gmail dot com
The code runs in CLI only via parallel. There's usually 5-8 copies running over different files and always one of them crashes withing 10-15 minutes.

I checked the cli and it looks like opcache is enabled. I will try to disable it since it serves no purpose in CLI (correct me if I'm wrong). I'm not sure if environment variable populated properly but after longer than usual run I've got such a valgrind dump: https://pastebin.com/FA4vVxzz
 [2019-07-12 16:51 UTC] grzegorz129 at gmail dot com
I asked around and the error is not a regress from a recent release. The code is pretty new so the last logged crash was on 7.3.3, it also happens on 7.3.6 from Ubuntu PPA.
 [2019-07-12 17:01 UTC] nikic@php.net
-Status: Open +Status: Assigned -Assigned To: +Assigned To: krakjoe
 [2019-07-12 17:01 UTC] nikic@php.net
Assigning to Joe as it seems likely that this is related to parallel.

The valgrind log is cut off due to too many spurious zend_string_equal_val errors. Could you please rerun with the suppressions file at https://gist.github.com/nikic/8d404c6799a1532b0c10280f5e57a888 and

    USE_ZEND_ALLOC=0 ZEND_DONT_UNLOAD_MODULES=1 valgrind --suppressions=php73.supp php script.php
 [2019-07-12 17:13 UTC] grzegorz129 at gmail dot com
Re-running with the options provided. After disabling opcache the result is the same. In fact I've got one crash after just seconds of running which is very short: https://pastebin.com/zxefix13

Another one took longer but crashed again: https://pastebin.com/hsC6nqGn

So I think opcache is not a factor here. To be precise about parallel it's not a parallelized on php level, but rather run with GNU Parallel.
 [2019-07-12 17:45 UTC] grzegorz129 at gmail dot com
As expected it crashed. The valgrind output still has a lot of "??"s but I'm not sure if that's expected: https://pastebin.com/ygvC5aws
 [2019-07-12 18:27 UTC] grzegorz129 at gmail dot com
And another valgrind output: https://pastebin.com/KVBjEeq5
 [2019-07-12 18:45 UTC] grzegorz129 at gmail dot com
I just got a very interesting crash on non-debug version of php with no parallelism or anything, just a simple command and it produced such trace:


#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007fe63d389801 in __GI_abort () at abort.c:79
#2  0x00007fe63d3d2897 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7fe63d4ffb9a "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3  0x00007fe63d3d990a in malloc_printerr (str=str@entry=0x7fe63d5013f0 "malloc_consolidate(): invalid chunk size") at malloc.c:5350
#4  0x00007fe63d3d9bae in malloc_consolidate (av=av@entry=0x7fe63d734c40 <main_arena>) at malloc.c:4441
#5  0x00007fe63d3e103b in _int_free (have_lock=0, p=<optimized out>, av=0x7fe63d734c40 <main_arena>) at malloc.c:4362
#6  __GI___libc_free (mem=0x563df37a0d10) at malloc.c:3124
#7  0x0000563df15b8e41 in zend_hash_destroy (ht=ht@entry=0x563df3762438) at ./Zend/zend_hash.c:1461
#8  0x0000563df159e1de in destroy_zend_class (zv=zv@entry=0x563df3341cc0) at ./Zend/zend_opcode.c:247
#9  0x0000563df159a730 in shutdown_executor () at ./Zend/zend_execute_API.c:345
#10 0x0000563df15a8efb in zend_deactivate () at ./Zend/zend.c:1104
#11 0x0000563df1547cbf in php_request_shutdown (dummy=<optimized out>) at ./main/main.c:1926
#12 0x0000563df1639ae4 in do_cli (argc=6, argv=0x563df22563b0) at ./sapi/cli/php_cli.c:1164
#13 0x0000563df140090b in main (argc=6, argv=0x563df22563b0) at ./sapi/cli/php_cli.c:1389
 [2019-07-12 18:50 UTC] nikic@php.net
-Status: Assigned +Status: Open -Assigned To: krakjoe +Assigned To:
 [2019-07-12 18:50 UTC] nikic@php.net
Unassigning Joe as this is GNU parallel, not ext/parallel ^^

The valgrind traces look very similar to bug #78010, which has a small repro. Maybe fixing that will fix your case as well.
 [2019-07-15 11:53 UTC] nikic@php.net
Can you please check whether the current PHP 7.3 head (that includes the fix for bug #78010) resolves your issue?
 [2019-07-15 18:10 UTC] grzegorz129 at gmail dot com
Yes, 7.3.7 compiled with the patch applied fixes my issue - the code processed couple millions of records and no crash was recorded :)

Big thanks you you Nikita as well as Dmitry for getting involved. You're amazing guys. I'm happy I was able to at least help with debugging and providing a simpler reproducer.
 [2020-02-26 19:31 UTC] cmb@php.net
-Status: Open +Status: Duplicate -Assigned To: +Assigned To: cmb
 [2020-02-26 19:31 UTC] cmb@php.net
Fine, closing as duplicate of bug #78010.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Mar 29 11:01:29 2024 UTC