go to bug id or search bugs for
From the start I have to say I don't have a reproducer which I can share, since problem only demonstrates itself in a large codebase. However, I'm able to consistently reproduce the problem and I was able to gather couple core dumps.
The only consistent element I see is the top frames on the chain: _emalloc -> zend_mm_alloc_heap -> zend_mm_alloc_small
Add a Patch
Add a Pull Request
Where did you get this 7.3.7 release? It was retagged due to a critical opcache bug, and I suspect your build does not include https://github.com/php/php-src/commit/21465ec0e1c1401751b35a21f45f1d57255d5be9.
And another ones: https://pastebin.com/06fsMevR
I got my 7.3.7 yesterday from github tgz release. I double checked and the path is in the source I compiled. I can try compiling latest master if that helps?
Thanks for checking, must be a different problem then. Is this a regression from a recent release? Does this occur under opcache?
If this is code callable from CLI, can you run PHP using "USE_ZEND_ALLOC=0 valgrind php script.php" and provide the resulting log?
- I ruled out hardware problem - this is reproducible on 3 AWS instances as well as 2 physical machines (running Ubuntu on VMWare with Windows host)
- System is not pressured into OOM (~40/60GB of physical memory free). The PHP process has memory_limit of 2GB of which it uses ~1.2-1.3GB
- The zbacktrace is almost always different while running the code. Originally the error presented itself while throwing an exception, now after couple code changes it usually either happens during Doctrine hydration but also in other places
- I tried both with gc enabled as well as gc_disable + gc_collect between big loops - no change
The code runs in CLI only via parallel. There's usually 5-8 copies running over different files and always one of them crashes withing 10-15 minutes.
I checked the cli and it looks like opcache is enabled. I will try to disable it since it serves no purpose in CLI (correct me if I'm wrong). I'm not sure if environment variable populated properly but after longer than usual run I've got such a valgrind dump: https://pastebin.com/FA4vVxzz
I asked around and the error is not a regress from a recent release. The code is pretty new so the last logged crash was on 7.3.3, it also happens on 7.3.6 from Ubuntu PPA.
Assigning to Joe as it seems likely that this is related to parallel.
The valgrind log is cut off due to too many spurious zend_string_equal_val errors. Could you please rerun with the suppressions file at https://gist.github.com/nikic/8d404c6799a1532b0c10280f5e57a888 and
USE_ZEND_ALLOC=0 ZEND_DONT_UNLOAD_MODULES=1 valgrind --suppressions=php73.supp php script.php
Re-running with the options provided. After disabling opcache the result is the same. In fact I've got one crash after just seconds of running which is very short: https://pastebin.com/zxefix13
Another one took longer but crashed again: https://pastebin.com/hsC6nqGn
So I think opcache is not a factor here. To be precise about parallel it's not a parallelized on php level, but rather run with GNU Parallel.
As expected it crashed. The valgrind output still has a lot of "??"s but I'm not sure if that's expected: https://pastebin.com/ygvC5aws
And another valgrind output: https://pastebin.com/KVBjEeq5
I just got a very interesting crash on non-debug version of php with no parallelism or anything, just a simple command and it produced such trace:
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007fe63d389801 in __GI_abort () at abort.c:79
#2 0x00007fe63d3d2897 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7fe63d4ffb9a "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3 0x00007fe63d3d990a in malloc_printerr (str=str@entry=0x7fe63d5013f0 "malloc_consolidate(): invalid chunk size") at malloc.c:5350
#4 0x00007fe63d3d9bae in malloc_consolidate (av=av@entry=0x7fe63d734c40 <main_arena>) at malloc.c:4441
#5 0x00007fe63d3e103b in _int_free (have_lock=0, p=<optimized out>, av=0x7fe63d734c40 <main_arena>) at malloc.c:4362
#6 __GI___libc_free (mem=0x563df37a0d10) at malloc.c:3124
#7 0x0000563df15b8e41 in zend_hash_destroy (ht=ht@entry=0x563df3762438) at ./Zend/zend_hash.c:1461
#8 0x0000563df159e1de in destroy_zend_class (zv=zv@entry=0x563df3341cc0) at ./Zend/zend_opcode.c:247
#9 0x0000563df159a730 in shutdown_executor () at ./Zend/zend_execute_API.c:345
#10 0x0000563df15a8efb in zend_deactivate () at ./Zend/zend.c:1104
#11 0x0000563df1547cbf in php_request_shutdown (dummy=<optimized out>) at ./main/main.c:1926
#12 0x0000563df1639ae4 in do_cli (argc=6, argv=0x563df22563b0) at ./sapi/cli/php_cli.c:1164
#13 0x0000563df140090b in main (argc=6, argv=0x563df22563b0) at ./sapi/cli/php_cli.c:1389
Unassigning Joe as this is GNU parallel, not ext/parallel ^^
The valgrind traces look very similar to bug #78010, which has a small repro. Maybe fixing that will fix your case as well.
Can you please check whether the current PHP 7.3 head (that includes the fix for bug #78010) resolves your issue?
Yes, 7.3.7 compiled with the patch applied fixes my issue - the code processed couple millions of records and no crash was recorded :)
Big thanks you you Nikita as well as Dmitry for getting involved. You're amazing guys. I'm happy I was able to at least help with debugging and providing a simpler reproducer.