|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #50894 no-op cast triggers copy-on-write (ternary operator triggers copy too)
Submitted: 2010-02-01 06:13 UTC Modified: 2015-04-06 15:42 UTC
Avg. Score:4.4 ± 0.7
Reproduced:10 of 10 (100.0%)
Same Version:8 (80.0%)
Same OS:7 (70.0%)
From: lee at projectmastermind dot com Assigned: dmitry (profile)
Status: Closed Package: Performance problem
PHP Version: 5.*, 6 OS: *
Private report: No CVE-ID: None
 [2010-02-01 06:13 UTC] lee at projectmastermind dot com
given a value with a particular type, casting it to that same type 
should essentially be a no-op -- once it is determined that the 
operand already has the correct type, no further action needs to be 

  $a = array();
  $b = (array)$a;

In this example, $a is already an array, so this should be a simple 
assignment operation.  $b should get a "lazy" copy of $a via PHP's 
copy-on-write policy.  Instead, the cast operation seems to force an 
immediate (non-lazy) full copy.

This creates a huge potential for hidden performance problems, as it 
causes code that *looks* like it would run in constant time [O(1)] to 
actually require linear time [O(n)] (where n represents the size of 
the data being copied).

I have verified that this issue does exist for string types as well.  
I assume that it applies to all PHP types.

Of course it becomes a significant performance issue primarily for 
types that can hold large amounts of data, where the data is 
duplicated whenever the zval is duplicated (AFAIK, this is only string 
and array).

I have verified this on the following versions of php:
  6.0.0-dev (php6.0-201001312130)

Reproduce code:

for( $z=1; $z<5; ++$z ) {
   $a = array_fill(0, 100*$z, '0');

   $t_start = microtime(true);
   for($i=0;$i<100000;++$i) {

      // O(n) [should be constant time, but isn't]
      // cast triggers non-lazy copy
      $b = (array)$a;

      // O(1) [constant time, as expected]
      // (comment above, and uncomment here for comparison)
      //$b = $a;
   $t_elapsed = (microtime(true)*1000)-($t_start*1000);
      "(%d elements * %d copies): %f ms\n\n", 
      100*$z, $i, $t_elapsed

Expected result:
(100 elements * 100000 loops): 11.264160 ms

(200 elements * 100000 loops): 11.363037 ms

(300 elements * 100000 loops): 11.208984 ms

(400 elements * 100000 loops): 11.809082 ms

NOTE: the time stays roughly constant as the number of elements 
increases -- the assignments are copy-on-write, so no significant 
performance hit is incurred.

Actual result:
(100 elements * 100000 copies): 736.453613 ms

(200 elements * 100000 copies): 1448.991211 ms

(300 elements * 100000 copies): 2130.541016 ms

(400 elements * 100000 copies): 2823.362793 ms

NOTE: the time increases as the size of the array increases.  (This 
happens with large strings too).  This is a good indicator that a copy 
is being made [non-lazily] when the cast is applied.


Pull Requests


AllCommentsChangesGit/SVN commitsRelated reports
 [2010-02-02 09:40 UTC]
It definitely not a bug, but a request for optimization.

In current implementation, the result of "$b = (array)$a;" can't be a simple assignment operation, because in general the result of type-cast operation is not a variable but temporary value. Such temporary value can't be created using copy-on-write and requires variable copying. On the following assignment opcode this value is used as is (without copying).

It's possible to optimize this situation by creation of additional opcode which combines ZEND_CAST+ZEND_ASSIGN, but I don't think it makes a lot of sense.
 [2010-02-04 06:00 UTC] lee at projectmastermind dot com
Thanks for your time. I appreciate all the work you guys do - I know 
it's not easy.

I'm amending this report to include the ternary operator too.  Turns 
out that it has exactly the same issue as the cast (presumably for the 
same reason that Dmitry explained).

  $a = array();
  $b = true ? $a : $a;  // forced copy happens here.

Whether we call these bugs, or optimization requests -- the fact is, 
they appear to the developer as "highly unexpected behavior". And 
they're out there right now causing real performance issues in real 
php-based systems. 

Do people really use casting and ternary in ways that will cause 
problems?  Yes they do...

The following types of statements are actually very common...
   $a = (array)$a;
   $a = is_array($a) ? $a : array();
   $a = (somecondition) ? $z1 : $z2;

A quick grep over the Zend Framework and the PEAR source trees (for 
example) finds extensive use of casting and ternary in assignments 
involving large primitive types (like array and string) -- and in 
plenty of those cases it's clear that the author never intended a copy 
to happen unless it was needed for type conversion.

I agree that adding new operations to address this (eg: ASSIGN+CAST) 
wouldn't make much sense.

I haven't had time to dig through the Zend code in detail to see how 
feasible this would be... but consider the following...

Would it be possible to change the two operation handlers in question 
to return a zval pointer (or a handle) instead of forcing them to 
always return a temporary?  If that's possible, then the handler could 
decide internally whether to allocate a temporary, or to just return 
the operand directly.

This sort of thing is already done with, for example, the assignment 
operator on chained assignments:

  /* works as expected - only lazy copies are made */
  $a = array();
  $c = $b = $a;
 [2010-12-14 12:34 UTC]
-Package: Feature/Change Request +Package: Performance problem
 [2011-08-02 19:01 UTC] koubel at volny dot cz
Optimization for ternary operator will be very helpful, it was discussed also in internals with some patch suggestion - Copy always isn't necessary for ternary operator anytime I think.
 [2015-04-06 15:42 UTC]
-Status: Assigned +Status: Closed
 [2015-04-06 15:42 UTC]
Implemented in PHP-7.
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Nov 30 01:01:33 2024 UTC