php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #80814 threaded mod_php won't load: No space available for static Thread Local Storage
Submitted: 2021-02-28 19:41 UTC Modified: 2021-03-09 07:44 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: theultramage at gmail dot com Assigned: dmitry (profile)
Status: Closed Package: Dynamic loading
PHP Version: 8.0.2 OS: FreeBSD 12.2
Private report: No CVE-ID: None
 [2021-02-28 19:41 UTC] theultramage at gmail dot com
Description:
------------
apache 2.4 with Event MPM + mod_php 8.0.2 with ZTS currently does not work on FreeBSD 12.0-12.2, while trying to load libphp.so the dynamic loader reports "No space available for static Thread Local Storage".

PHP 7.3.20 on the same system does not exhibit this problem. So it seems to be caused by code changes or perhaps build system changes.

I found one related php bugreport - https://bugs.php.net/bug.php?id=71189 - but it was talking about php 7.0 on FreeBSD 8.1, a super outdated OS from 2009, which makes me wonder if that would even build, so maybe that part was entered wrong. There is no developer feedback in that thread and the only suggestion is to fall back to single-threaded prefork mode.

This issue was brought up on the freebsd bugtracker for mod_php80 last year - https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=250652 - but the maintainer dismissed it as not a php bug and no further attention was given to it. Followup comments imply that php-fpm is also affected so it's not just a mod_php thing. Another comment suggests recompiling kernel+world with a bigger RTLD_STATIC_TLS_EXTRA constant, based on an old issue in one bsd fork. There are very few results when searching for that error message.

The issue is somehow related to thread local storage and dynamically loaded modules. I tried a small C++ test case that involved dynamically loading a shared library with threading and large arrays declared as thread_local, but it worked fine. Whatever the issue is, it's not as simple.


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2021-03-03 05:01 UTC] theultramage at gmail dot com
Through further reading and testing, I have identified the cause and prepared a small code sample. It is indeed tied to dynamic loading and thread local storage, specifically the use of the initial-exec tls model by TSRM and modules that rely on it.

TSRM provides a 'cache' in the form of a TLS void* (8 bytes) marked as __attribute__((tls_model("initial-exec"))), and a collection of macros to work with it. The initial-exec model is meant for shared libraries that load along with the main executable, and not for dynamic loading, but FreeBSD does reserve a bit of extra space (128 bytes) if for some obscure reason a shared library flagged as DF_STATIC_TLS needs to be loaded dynamically.

This value is a compile-time constant in the dynamic linker, and applies to the total static tls size across all loaded modules. That is not a lot of room to work with. In my opinion, it feels like it's not meant for general-purpose use, and using a dynamically loaded library like this may mean wandering into undefined behavior territory. I'd study the specs real hard before choosing this approach.

In my custom installation, using 'readelf' I counted 11 ext modules flagged as STATIC_TLS and having an 8-byte TLS section. Adding 16 bytes used by mod_php itself, that's 104. So if 4 more modules like this are added, or the existing ones start using TSRMLS_CACHE, mod_php will fail to load.

The initial-exec shenanigans were added in https://github.com/php/php-src/commit/2aefd112114bd150f5dbba0be6d0f8601561da4e with minimal information provided in the commit. So at first glance it seems like dubious premature optimization that may also be touching on undefined behavior.
The whole tls pointer cache comes from edits 7 years ago and is also where that one overlong out-of-place define that litters all the makefiles came from. See 
https://github.com/php/php-src/commit/8aeffdd74c826b5012c9b052848cfa8e593776ed
https://github.com/php/php-src/commit/b3aebda9eaf55706af2e21178f229a171725a168
At this point in time, I would question if this is still providing any performance improvement over what clang/gcc can do, or if it's hindering performance. It's definitely making the code harder to read.
And then there are ignored bugreports like https://bugs.php.net/bug.php?id=50238 which seems to imply that the array juggling macros in TSRM.h alone might have been causing a measurable performance drop.
I also happened to find this 2009 page https://wiki.php.net/rfc/tls which may be the origin of all this.

The more immediate issue is this:

libphp.so contains a 328-byte TLS section and the module is marked as STATIC_TLS. This file is thus guaranteed to fail to load dynamically on current FreeBSD. And now I can't start my webserver anymore.

Through test code, I have determined that if any thread-local variable is flagged as initial-exec, all of the other thread-local variables become like that as well - I'm guessing the ELF format doesn't support multi-mode TLS. This means that all the other otherwise innocent globals, marked as ZEND_TLS, get counted toward the static tls limit. The .symtab section of libphp.so is not present even in debug builds, so I can't check to make sure, but the number sounds about right based on what I've seen in the code.

I was not able to get rid of the STATIC_TLS flag on libphp.so just by patching TSRM.h, even though that's the only place that uses this attribute, and none of the makefiles pass -ftls-model. Turns out there is hand-crafted assembly code that directly references GOTTPOFF in one helper function in TSRM.c that is used in ext/opcache jit code. It was added by the same person who did the initial-exec stuff, in https://github.com/php/php-src/commit/9a06876072b9ccb023d4a14426ccb587f10882f3
Once that was dealt with, libphp.so became usable again for me.


Test script:
---------------
// test.c: cc test.c -o test
#include <dlfcn.h>
#include <stdio.h>
int main()
{
    void* h = dlopen("./ext.so", RTLD_LAZY);
    if( h == NULL ) { puts(dlerror()); return 0; }
    return 0;
}

// ext.c: cc -shared -fpic ext.c -o ext.so
static __thread char v[129] __attribute__((tls_model("initial-exec")));
void dummy() { v[0] = 0; } // force reference
 [2021-03-03 08:24 UTC] nikic@php.net
-Assigned To: +Assigned To: dmitry
 [2021-03-05 09:28 UTC] dmitry@php.net
https://github.com/php/php-src/commit/2aefd112114bd150f5dbba0be6d0f8601561da4e was provided as an optimization (information was taken from https://www.uclibc.org/docs/tls.pdf), and it seemed work fine. If it doesn't work for FreeBSD, we may disable this using #ifdef(s).

From your explanation, I didn't completely understand, if STATIC_TLS can't work at all or only when TLS sections exceeds some limit. In first case even CLI PHP won't be able to load opcache.so.

I don't have access for FreeBSD. Can you check if the patch https://gist.github.com/dstogov/c7c01283766376478b5f7073c1a432a9 fixes the problem (may be correct it, if necessary).
 [2021-03-05 12:59 UTC] theultramage at gmail dot com
Ah. I see. Your comment showed that the situation is trickier than I originally considered. I will try to clarify things based on what I've learned so far.

tls_model("initial-exec") is meant to be used in code whose memory layout can be resolved during program initialization. So ordinary executables, static .a libraries, or dynamic .so modules which are hard-bound to the main executable via import table. It's not meant for true dynamic libraries, loaded via dlopen(). Or so I've read. OSes do seem to have a way of making this work anyway, in a limited manner. FreeBSD prepares 128 bytes of space in every process. I also tested Ubuntu 20, and found that the limit was 1720 bytes. I do not know if this is just an improvised compat mechanism, or if it is something documented and fully endorsed by the ABI specification. I have not consulted this with core devs on the mailing list yet. The opinion from #clang is that this should not be used.

The limit is global for the whole process. If each module needs 8 bytes, then at most 16 such modules can be loaded. Currently I am counting 33 php extensions that actively use ZEND_TSRMLS_CACHE_DEFINE() (and 18 that #include TSRM.h but don't use it?), so I don't believe it's currently possible to have them all dynamically loaded on FreeBSD. Though that is not a good enough reason to disable the optimization entirely, if it indeed leads to observable performance improvement. I believe that for cli/cgi/fpm, a workaround would be to have a bunch of the extensions compiled-in. But then again, I'm realizing that these three appear to be single-threaded static executables that do not need ZTS or thread local variables, so non of this matters for them.

The other problem that I described (libphp.so requiring 328 bytes of static tls storage) thus turns out to be limited to mod_php, which is a dynamically loaded shared library. The reason is that if any piece of code requires the "initial-exec" static tls model, then the whole shared library is flagged as such. Here, not only does zend.c's use of TSRMLS_CACHE do it, but so will any threaded extension that gets compiled in. Furthermore, the mere presence of 'gottpoff' (and perhaps 'ntpoff') in inline assembly code will force static tls mode.

Regarding your patch, it looks very close to what I arrived at on my end. For opcache I also patched out the i386 branch just in case. It makes me wonder though, why does tsrm_get_ls_cache_tcb_offset() in TSRM.c even exist, when opcache completely replicates it?

An alternative way, that keeps the extensions as STATIC_TLS, would be to remove that assembly code from TSRM.c, and disable that TSRM_TLS_MODEL_ATTR on libphp.so itself and any extensions that get compiled in, while still applying it on shared extensions somehow.
 [2021-03-09 07:44 UTC] dmitry@php.net
The code in TSRM.c allows to use the most efficient "local-exec" access model from JIT-ed code. If tsrm_get_ls_cache_tcb_offset() returns 0, zend_jit_setup() switches JIT code-generator to less efficient access method. This works fine on Linux with both GCC and CLANG.

I didn't get, if you tested the patch, and it's good enough, to fix the problem on FreeBSD. If it misses something (e.g. i386 support) please extend it.
 [2021-03-10 13:04 UTC] dmitry@php.net
Automatic comment on behalf of dmitry@zend.com
Revision: http://git.php.net/?p=php-src.git;a=commit;h=3b377b51a22681f4594f8eb55e6de25ea01204c1
Log: Fixed bug #80814 (threaded mod_php won't load on FreeBSD: No space available for static Thread Local Storage)
 [2021-03-10 13:04 UTC] dmitry@php.net
-Status: Assigned +Status: Closed
 
PHP Copyright © 2001-2021 The PHP Group
All rights reserved.
Last updated: Tue May 11 09:01:24 2021 UTC