php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #53458 Non-linear performance degradation on certain prepared SELECT queries
Submitted: 2010-12-02 23:29 UTC Modified: 2020-08-28 14:13 UTC
Votes:10
Avg. Score:4.3 ± 1.3
Reproduced:10 of 10 (100.0%)
Same Version:1 (10.0%)
Same OS:1 (10.0%)
From: don at smugmug dot com Assigned: cmb (profile)
Status: Duplicate Package: PDO MySQL
PHP Version: 5.3.3 OS: CentOS 5.5
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: don at smugmug dot com
New email:
PHP Version: OS:

 

 [2010-12-02 23:29 UTC] don at smugmug dot com
Description:
------------
When retrieving results from prepared PDO statements against MySQL, we get 
performance that diverges from expected by 10X or more on results as low as 10000 
rows.  This only occurs for 'SELECT ... WHERE Id IN ( .. )' queries.  

The attached script provides two PDO prepared approaches ('row-prepared', default, 
and 'all-prepared') as well as a variety of control methods that use non-prepared 
PDO queries, straight MySQL, and prepared queries using MySQLi.  Only PDO with 
prepared queries exhibits the problem.

If the query is broken up into chunks that return 1000 rows or less prior to 
execution, then combined afterwards in PHP, performance is as expected.

Test script:
---------------
You can get the sample script from: http://www.smugmug.com/test/pdo-problem.php.gz

Expected result:
----------------
pdo-problem.php?type=row-prepared and pdo-problem.php?type=all-prepared should 
return in ~0.5s, like the other types (row, all, chunk, mysql, mysqli).

Actual result:
--------------
pdo-problem.php?type=row-prepared and pdo-problem.php?type=all-prepared return in 
~6s instead of ~0.5s

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2010-12-02 23:39 UTC] don at smugmug dot com
Here's my PHP build configuration:

'./configure' '--enable-exif' '--with-gd' '--enable-gd-native-ttf' '--with-jpeg-
dir=/usr' '--with-png-dir=/usr' '--with-freetype-dir=/usr' '--with-zlib' '--
enable-inline-optimization' '--with-bz2' '--with-apxs2' '--with-xmlrpc' '--with-
curl' '--with-libdir=lib64' '--with-pdo-mysql=mysqlnd' '--with-mcrypt' '--enable-
bcmath' '--with-gmp' '--enable-mbstring' '--with-mysql=mysqlnd' '--with-openssl' 
'--with-mysqli=mysqlnd'
 [2010-12-03 03:36 UTC] don at smugmug dot com
FYI, setting PDO::MYSQL_ATTR_USE_BUFFERED_QUERY to true/false didn't have a large impact.  Still ~10X slower than any of the other methods either way.  (Buffered was slightly faster)
 [2010-12-03 04:27 UTC] wez@php.net
from a twitter conversation with Don, the heart of this issue is that the there 
appears to be a difference in the performance of fetch() or fetchAll() depending 
on whether query() or prepare() + execute() were used.

Given that query() is really just prepare() + execute() under the covers, it is 
difficult to explain this difference.
 [2013-03-19 16:34 UTC] mike@php.net
The test script is not accessible anymore, so I assume it is about 

$stm->execute($a10kArray)

If that is the case, then the issue might be described as follows: 

Each element in the array to stm->execute() is registered as a bound parameter, for which the param_hook callback is called twice (normalize and alloc) and once when everything is done (free).

So for a 10k array, this are 30k callbacks into the pdo driver.

When fetching a row, the param_hook callback is called twice for each param (fetch_pre and fetch_post), i.e. 20k callbacks.

So for a 10k rowset add 200M calls.

In numbers: 200,030,000 indirect calls to the param_hook callback of the underlying pdo driver.

Iterating over the 10k params twice for each of the 10k rows is nearly (twice in a debug build) as expensive as calling the param_hook itself.

Add this two things up, and you easily wait a second for your result set.
 [2013-03-19 16:34 UTC] mike@php.net
-Status: Open +Status: Analyzed
 [2013-03-19 17:44 UTC] don at smugmug dot com
Sorry for the script getting lost, been awhile.  :)  Here's a gist with it:  
https://gist.github.com/onethumb/5198291

The thing that still leaves me wondering after your analysis is why doing a 
prepared SELECT on 1K rows, ten times, results in a >10X faster return than a 
single SELECT on 10K rows? Presumably, we'd have the same # of calls both ways?

I just validated against 5.4.10, too.  Same problem.
 [2013-03-19 21:43 UTC] mike@php.net
Nope, it's not obvious at the first glance, but if you fetch 10k rows with 10k 
params you have N * 1e4 * 1e4 (Ne8) iterations, where N stands for the number of 
callbacks per fetched row.

While, if you fetch 1k rows with 1k params for 10 times, it's N * 10 * 1e3 * 1e3 
(Ne7) (90% less).
 [2014-01-01 12:46 UTC] felipe@php.net
-Package: PDO related +Package: PDO MySQL
 [2020-08-28 14:13 UTC] cmb@php.net
-Status: Analyzed +Status: Duplicate -Assigned To: +Assigned To: cmb
 [2020-08-28 14:13 UTC] cmb@php.net
Closing as duplicate of bug #80027, because the other ticket is
currently being worked on.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Mar 19 07:01:29 2024 UTC