php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #77536 finfo considerably slower on 7.3 vs. 7.2
Submitted: 2019-01-28 23:57 UTC Modified: 2019-03-31 04:22 UTC
Votes:3
Avg. Score:4.0 ± 0.8
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: jscaltreto at gmail dot com Assigned:
Status: No Feedback Package: Filesystem function related
PHP Version: 7.3.1 OS:
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please — but make sure to vote on the bug!
Your email address:
MUST BE VALID
Solve the problem:
9 + 11 = ?
Subscribe to this entry?

 
 [2019-01-28 23:57 UTC] jscaltreto at gmail dot com
Description:
------------
Calls via finfo to lookup mimetype of files take 30%+ longer to run on php 7.3+ than on 7.2. In an extreme case on a cpu-loaded machine I've observed finfo->file() calls taking upwards of 500ms in 7.3, where on the same hardware with the same load the calls typically take <1ms for the same file in 7.2. 

The included test script can be executed in both 7.2 and 7.3 to illustrate the problem. I've also created a simple test using official docker images to reproduce the issue: https://gist.github.com/jscaltreto/8f6adf38e7abb08a831ef37b84c14ace

Test script:
---------------
<?php
$iters = 1000;

$start = microtime(true);
for ($i = 1; $i <= $iters; $i++){
  $finfo = new finfo(FILEINFO_MIME);
  $type = $finfo->file("/bin/sh");
  unset($finfo);
}
$ttl = microtime(true) - $start;
echo "Took: $ttl\n";
?>


Expected result:
----------------
7.2 and 7.3 should execute the script in roughly the same amount of time.

Actual result:
--------------
7.3 takes 30%+ more time to execute.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2019-01-29 02:33 UTC] danack@php.net
Please could you run the scripts through strace to identify the system call that is causing the difference: https://linux.die.net/man/1/strace

Possibly also, although it might be clear to you what the official dockers images are, please could you link to them or name them as they are on docker hub. Or even better provide a Docker compose to run the two things through.
 [2019-01-29 02:37 UTC] danack@php.net
ah - Just read the actual run script. nvm about the docker compose.
 [2019-01-29 15:25 UTC] jscaltreto at gmail dot com
Here is an strace (strace -ttT) from just before I call finfo->file() in 7.2:

15:13:08.781506 write(1, "MARKSTART", 9) = 9 <0.000127>
15:13:08.781814 lstat("/bin/sh", {st_mode=S_IFLNK|0777, st_size=4, ...}) = 0 <0.000010>
15:13:08.781986 readlink("/bin/sh", "dash", 4096) = 4 <0.000012>
15:13:08.782172 lstat("/bin/dash", {st_mode=S_IFREG|0755, st_size=117208, ...}) = 0 <0.000075>
15:13:08.782393 lstat("/bin", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 <0.000027>
15:13:08.782614 open("/bin/dash", O_RDONLY) = 3 <0.000085>
15:13:08.782866 fstat(3, {st_mode=S_IFREG|0755, st_size=117208, ...}) = 0 <0.000068>
15:13:08.783073 lseek(3, 0, SEEK_CUR)   = 0 <0.000009>
15:13:08.783272 fstat(3, {st_mode=S_IFREG|0755, st_size=117208, ...}) = 0 <0.000068>
15:13:08.783463 fstat(3, {st_mode=S_IFREG|0755, st_size=117208, ...}) = 0 <0.000065>
15:13:08.783657 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\2407\0\0\0\0\0\0"..., 8192) = 8192 <0.000074>
15:13:08.783886 read(3, "\10\0\0\0\0\0\0\0\31`\1\0\0\0\0\0@\267!\0\0\0\0\0\10\0\0\0\0\0\0\0"..., 8192) = 8192 <0.000058>
15:13:08.784123 read(3, "\367\333I\211E\0\350\305\4\0\0D\211\347A\211\304\351\347\376\377\377f.\17\37\204\0\0\0\0\0"..., 8192) = 8192 <0.000055>
15:13:08.784336 read(3, "\6\0\0L\211\353E1\344\17\37\200\0\0\0\0\272\3\0\0\0L\211\356L\211\367\350@9\0\0"..., 8192) = 8192 <0.000077>
15:13:08.784545 read(3, "C!\0H\215=3\316\0\0H9\305t\24H\215P\1H9\325\17\204\204\1\0\0\306E\377\0"..., 8192) = 8192 <0.000061>
15:13:08.784744 read(3, "\5\327)!\0\235\377\377\377\307\5QJ!\0\235\377\377\377\270~\377\377\377\351h\377\377\377\17\37\0"..., 8192) = 8192 <0.000029>
15:13:08.784971 read(3, "I\213G H\211\2H\213\5\272\3!\0H\205\300\17\204R\1\0\0\200x\36\1t\21\351G\1"..., 8192) = 8192 <0.000010>
15:13:08.785190 read(3, "\203\351 \0\1u\n\213\5\213\351 \0\205\300u\0171\300H\203\304\10\303\17\37\204\0\0\0\0\0"..., 8192) = 8192 <0.000054>
15:13:08.785413 read(3, "H\213\35A\300 \0\351\347\367\377\377\213\5\22\355 \0\203\5\v\300 \0\1\205\300\17\204p\374\377"..., 8192) = 8192 <0.000072>
15:13:08.785647 read(3, "$8H\215l$@I\203\305\10L\211-\266\246 \0H\211D$\10H\215E\4H\211D$\30"..., 8192) = 8192 <0.000077>
15:13:08.785873 read(3, "H\205\377u\34\3534f\17\37\204\0\0\0\0\0H\203\303\10\350W:\377\377H\213;H\205\377t"..., 8192) = 8192 <0.000078>
15:13:08.786095 read(3, "MAX-12\0RTMAX-11\0RTMAX-10\0RTMAX-9"..., 8192) = 8192 <0.000057>
15:13:08.786316 read(3, "B\16(\214\5D\0160\206\6A\0168\203\7P\16\300\2\3!\1\n\0168A\0160A\16(B"..., 8192) = 8192 <0.000030>
15:13:08.786542 read(3, "\16\10\0\0\0\0\0\0\24\0\0\0\f2\0\0\260\246\377\377\35\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192 <0.000079>
15:13:08.786767 read(3, "\0\0\0\0\0\0\0\0\10\300!\0\0\0\0\0\240\354!\0\0\0\0\0\240\354!\0\0\0\0\0"..., 8192) = 2520 <0.000028>
15:13:08.786989 read(3, "", 8192)       = 0 <0.000057>
15:13:08.787207 mmap(NULL, 937984, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f163d189000 <0.000030>
15:13:08.787562 lseek(3, 117208, SEEK_SET) = 117208 <0.000078>
15:13:08.787882 munmap(0x7f163d189000, 937984) = 0 <0.000087>
15:13:08.788118 close(3)                = 0 <0.000010>
15:13:08.788354 write(1, "MARKEND", 7)  = 7 <0.000093>
15:13:08.788638 write(1, "Took: 0.0071139335632324\n", 25) = 25 <0.000042>

And here's the same trace for php-7.3:

15:13:09.822814 write(1, "MARKSTART", 9) = 9 <0.000082>
15:13:09.822984 lstat("/bin/sh", {st_mode=S_IFLNK|0777, st_size=4, ...}) = 0 <0.000089>
15:13:09.823168 readlink("/bin/sh", "dash", 4096) = 4 <0.000011>
15:13:09.823296 lstat("/bin/dash", {st_mode=S_IFREG|0755, st_size=117208, ...}) = 0 <0.000015>
15:13:09.823435 lstat("/bin", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 <0.000011>
15:13:09.823573 open("/bin/dash", O_RDONLY) = 3 <0.000013>
15:13:09.823676 fstat(3, {st_mode=S_IFREG|0755, st_size=117208, ...}) = 0 <0.000009>
15:13:09.823826 lseek(3, 0, SEEK_CUR)   = 0 <0.000008>
15:13:09.823968 fstat(3, {st_mode=S_IFREG|0755, st_size=117208, ...}) = 0 <0.000302>
15:13:09.824354 fstat(3, {st_mode=S_IFREG|0755, st_size=117208, ...}) = 0 <0.000069>
15:13:09.824518 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\2407\0\0\0\0\0\0"..., 8192) = 8192 <0.000035>
15:13:09.824658 read(3, "\10\0\0\0\0\0\0\0\31`\1\0\0\0\0\0@\267!\0\0\0\0\0\10\0\0\0\0\0\0\0"..., 8192) = 8192 <0.000011>
15:13:09.824794 read(3, "\367\333I\211E\0\350\305\4\0\0D\211\347A\211\304\351\347\376\377\377f.\17\37\204\0\0\0\0\0"..., 8192) = 8192 <0.000009>
15:13:09.824893 read(3, "\6\0\0L\211\353E1\344\17\37\200\0\0\0\0\272\3\0\0\0L\211\356L\211\367\350@9\0\0"..., 8192) = 8192 <0.000010>
15:13:09.825039 read(3, "C!\0H\215=3\316\0\0H9\305t\24H\215P\1H9\325\17\204\204\1\0\0\306E\377\0"..., 8192) = 8192 <0.000011>
15:13:09.825182 read(3, "\5\327)!\0\235\377\377\377\307\5QJ!\0\235\377\377\377\270~\377\377\377\351h\377\377\377\17\37\0"..., 8192) = 8192 <0.000011>
15:13:09.825330 read(3, "I\213G H\211\2H\213\5\272\3!\0H\205\300\17\204R\1\0\0\200x\36\1t\21\351G\1"..., 8192) = 8192 <0.000011>
15:13:09.825483 read(3, "\203\351 \0\1u\n\213\5\213\351 \0\205\300u\0171\300H\203\304\10\303\17\37\204\0\0\0\0\0"..., 8192) = 8192 <0.000021>
15:13:09.825622 read(3, "H\213\35A\300 \0\351\347\367\377\377\213\5\22\355 \0\203\5\v\300 \0\1\205\300\17\204p\374\377"..., 8192) = 8192 <0.000021>
15:13:09.825736 read(3, "$8H\215l$@I\203\305\10L\211-\266\246 \0H\211D$\10H\215E\4H\211D$\30"..., 8192) = 8192 <0.000018>
15:13:09.825879 read(3, "H\205\377u\34\3534f\17\37\204\0\0\0\0\0H\203\303\10\350W:\377\377H\213;H\205\377t"..., 8192) = 8192 <0.000010>
15:13:09.826013 read(3, "MAX-12\0RTMAX-11\0RTMAX-10\0RTMAX-9"..., 8192) = 8192 <0.000020>
15:13:09.826153 read(3, "B\16(\214\5D\0160\206\6A\0168\203\7P\16\300\2\3!\1\n\0168A\0160A\16(B"..., 8192) = 8192 <0.000027>
15:13:09.826287 read(3, "\16\10\0\0\0\0\0\0\24\0\0\0\f2\0\0\260\246\377\377\35\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192 <0.000010>
15:13:09.826425 read(3, "\0\0\0\0\0\0\0\0\10\300!\0\0\0\0\0\240\354!\0\0\0\0\0\240\354!\0\0\0\0\0"..., 8192) = 2520 <0.000009>
15:13:09.826559 read(3, "", 8192)       = 0 <0.000009>
15:13:09.826679 mmap(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5368189000 <0.000013>
15:13:09.826776 munmap(0x7f5368189000, 2097152) = 0 <0.000021>
15:13:09.826947 mmap(NULL, 4190208, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5367f8a000 <0.000010>
15:13:09.827063 munmap(0x7f5367f8a000, 483328) = 0 <0.000028>
15:13:09.827186 munmap(0x7f5368200000, 1609728) = 0 <0.000009>
15:13:09.827267 madvise(0x7f5368000000, 2097152, MADV_HUGEPAGE) = 0 <0.000009>
15:13:10.101436 lseek(3, 117208, SEEK_SET) = 117208 <0.000014>
15:13:10.101930 close(3)                = 0 <0.000021>
15:13:10.102099 write(1, "MARKEND", 7)  = 7 <0.000066>
15:13:10.102309 write(1, "Took: 0.27948999404907\n", 23) = 23 <0.000085>

There are a handful of additional memory calls in 7.3, but the calls themselves are very quick. The problem area actually appears to be between syscalls here:

15:13:09.827267 madvise(0x7f5368000000, 2097152, MADV_HUGEPAGE) = 0 <0.000009>
15:13:10.101436 lseek(3, 117208, SEEK_SET) = 117208 <0.000014>

The madvise only takes 9us, but the timestamp of the next call is nearly 200ms later.
 [2019-01-29 15:35 UTC] spam2 at rhsoft dot net
looks like again an issue with tramsparent huge pages which feels more and more like a nice idea but completly unusabe in real life!

the aim to be faster *but* tlb misses are much more expensive and memory fragmentation comes into play which leads to unpredictable results because it takes ages to recover / cleanup

when you endforce the use instead of madvise just wait a few hours and complete machines are freezing to death

no idea why this is still that unuseable after that many years and not got either fixed or thrown away entirely in the linux kernel
 [2019-01-29 16:01 UTC] ab@php.net
Thanks for the report. I don't seem to reproduce it, compiled both versions on the same host with same configure args. 

- 7.3 took 0.67183995246887
- 7.2 took 0.67918705940247

The diff is almost negligible and can vary on both versions within +/- 3% on the given snippet. There must be something Docker or environment specific causing this.

Btw. there is a certain diff in the libmagic versions still. It might make sense you to check, whether it's related to bug #77095. Newer libmagic versions consume more input and use more comprehensive regex in some cases, which can especially affect files with a lot of spaces. However /bin/dash definitely doesn't fall under that.

Thanks.
 [2019-01-29 16:45 UTC] jscaltreto at gmail dot com
I believe the commenter above may be on to something. The results in my environment with thp defrag set to defer is much better:

jake@new-vps ~/phptest % cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never
jake@new-vps ~/phptest % cat /sys/kernel/mm/transparent_hugepage/defrag 
always defer defer+madvise [madvise] never
jake@new-vps ~/phptest % ./test2.sh 100                                 
Testing finfo over 100 iterations
---
Testing PHP 7.2
Took: 0.11759901046753
---
Testing PHP 7.3
Took: 30.368657827377
jake@new-vps ~/phptest % echo defer | sudo tee /sys/kernel/mm/transparent_hugepage/defrag   
defer
jake@new-vps ~/phptest % ./test2.sh 100                                                  
Testing finfo over 100 iterations
---
Testing PHP 7.2
Took: 0.096560001373291
---
Testing PHP 7.3
Took: 0.44092488288879


So it appears the major issue I'm seeing is with thp and memory fragmentation. Which I suppose would be considered environmental, but considering many distros enable thp out of the box there may be something to evaluate.
 [2019-02-18 13:54 UTC] nikic@php.net
I've disabled the use of MADV_HUGEPAGE by default in https://github.com/php/php-src/commit/928c42211f737640e4dc3c9702ba833c3059bddf.
 [2019-03-18 10:59 UTC] nikic@php.net
-Status: Open +Status: Feedback
 [2019-03-18 10:59 UTC] nikic@php.net
The aforementioned commit is in 7.3.3. Can you confirm that this performance issue is resolved now?
 [2019-03-31 04:22 UTC] php-bugs at lists dot php dot net
No feedback was provided. The bug is being suspended because
we assume that you are no longer experiencing the problem.
If this is not the case and you are able to provide the
information that was requested earlier, please do so and
change the status of the bug back to "Re-Opened". Thank you.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Apr 25 01:01:30 2024 UTC