php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #66052 Serialized value ids are shared between nested serialization operations
Submitted: 2013-11-07 20:09 UTC Modified: 2021-04-27 15:41 UTC
Votes:16
Avg. Score:4.9 ± 0.3
Reproduced:15 of 15 (100.0%)
Same Version:1 (6.7%)
Same OS:2 (13.3%)
From: crog at gustavus dot edu Assigned:
Status: Wont fix Package: *General Issues
PHP Version: 5.4.21 OS:
Private report: No CVE-ID: None
 [2013-11-07 20:09 UTC] crog at gustavus dot edu
Description:
------------
The current serialization format does not specify value ids and instead relies on a count from the beginning of the data. While this works fine for "standard" serialization, it breaks horribly when custom serialization (via Serializable) is thrown into the mix.

In the case in which we first discovered this issue, we have an object which saves and loads its data to a file, database or whatever (depending on the current implementation of the save() method). It can then be saved/loaded via the save()/load() methods respectively, and this works fine.

To ensure that we weren't wasting time/resources, we also implemented the Serializable interface and a serialize method which simply calls save() and returns a single string which can be used to identify the location of our data (a filename if stored on the file system, row id if in the database, etc.). Conversely, unserialize sets the source and calls load(). This, again, works fine if used in isolation.

Now, where the problems arise is when the two are mixed. Our current file system implementation of save() does its own serialization. If we call save() to save our data, but then load through it through the unserialize operation, the serialized data's reference counts are all off by one (as the original serialization didn't go through serialize). The same off-by-one bug shows up if this is done in reverse (saved through serialize(), loaded through load()).

The effects of this bug are one of two things:
- The values simply point to the wrong variables. Very hard to debug and incredibly confusing to anyone unfamiliar with the serialization format.
- The values point to non-existent values (-1 or max +1). This causes an error which reports the byte offset into the serialized data where the error occurred.




The test below is rather convoluted for a test case, but demonstrates the problem as it relates to the code in which we discovered it. I can, if necessary, attempt to provide further explanation or additional details.

Test script:
---------------
  class ObjectWithReferences {
    protected $var1;
    protected $var2;
    protected $var3;
    
    public function __construct() {
      $this->var1 = new StdClass();
      $this->var2 = null;
      $this->var3 = $this->var1;
    }
  }


  class WrapperObject implements Serializable
  {
    static $tempstorage;
  
    protected $obj;
 
    public function setObject($obj) {
      $this->obj = $obj;
    }
    
    public function getObject() {
      if (is_string($this->obj)) {
        $this->obj = unserialize($this->obj);
      }
    
      return $this->obj;
    }
 
    public function save() {
      $archive = (object) [
        'object' => is_string($this->obj) ? $this->obj : serialize($this->obj),
      ];
    
      static::$tempstorage = json_encode($archive);
      var_dump("Serialized:", static::$tempstorage);
    }
    
    public function load() {
      $archive = json_decode(static::$tempstorage);
      
      $this->obj = $archive->object;
    }
 
    public function serialize() {
      $this->save(false);
      return "wrapper!"; // Doesn't actually matter here.
    }
    
    public function unserialize($serialized) {
      $this->load();
    }
  }
  
  
  $owr = new ObjectWithReferences();
  $wrapper = new WrapperObject();
  
  $wrapper->setObject($owr);
  $wrapper->save();
  $wrapper->load();
  var_dump($wrapper->getObject());
  
  $serialized = serialize($wrapper);
  $wrapper = unserialize($serialized);
  
  var_dump($wrapper->getObject());

Expected result:
----------------
string 'Serialized:' (length=11)
string '{"object":"O:20:\"ObjectWithReferences\":3:{s:7:\"\u0000*\u0000var1\";O:8:\"stdClass\":0:{}s:7:\"\u0000*\u0000var2\";N;s:7:\"\u0000*\u0000var3\";r:2;}"}' (length=152)
object(ObjectWithReferences)[9]
  protected 'var1' => 
    object(stdClass)[10]
  protected 'var2' => null
  protected 'var3' => 
    object(stdClass)[10]
string 'Serialized:' (length=11)
string '{"object":"O:20:\"ObjectWithReferences\":3:{s:7:\"\u0000*\u0000var1\";O:8:\"stdClass\":0:{}s:7:\"\u0000*\u0000var2\";N;s:7:\"\u0000*\u0000var3\";r:3;}"}' (length=152)
object(ObjectWithReferences)[8]
  protected 'var1' => 
    object(stdClass)[9]
  protected 'var2' => null
  protected 'var3' => 
    object(stdClass)[9]

Actual result:
--------------
string 'Serialized:' (length=11)
string '{"object":"O:20:\"ObjectWithReferences\":3:{s:7:\"\u0000*\u0000var1\";O:8:\"stdClass\":0:{}s:7:\"\u0000*\u0000var2\";N;s:7:\"\u0000*\u0000var3\";r:2;}"}' (length=152)
object(ObjectWithReferences)[9]
  protected 'var1' => 
    object(stdClass)[10]
  protected 'var2' => null
  protected 'var3' => 
    object(stdClass)[10]
string 'Serialized:' (length=11)
string '{"object":"O:20:\"ObjectWithReferences\":3:{s:7:\"\u0000*\u0000var1\";O:8:\"stdClass\":0:{}s:7:\"\u0000*\u0000var2\";N;s:7:\"\u0000*\u0000var3\";r:3;}"}' (length=152)
object(ObjectWithReferences)[8]
  protected 'var1' => 
    object(stdClass)[9]
  protected 'var2' => null
  protected 'var3' => null

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2013-11-19 19:40 UTC] adrien dot brault at gmail dot com
Modified the test case and found another issue

http://3v4l.org/UCNin
 [2013-11-19 19:46 UTC] adrien dot brault at gmail dot com
Another failing test case http://3v4l.org/JbqmH
 [2014-02-28 17:59 UTC] crog at gustavus dot edu
Here's another example showing just how screwy serialization is once you throw hooks into the mix:

http://3v4l.org/XaTAq#v5422

==================================================

Code:
<?php
class ObjectWithReferences {
  protected $var1;
  protected $var2;

  public function __construct() {
    $this->var1 = new StdClass();
    $this->var2 = $this->var1;
  }
}


class WrapperObject implements Serializable
{
  private $obj;

  public function __construct($obj) {
    $this->obj = $obj;
  }

  public function getObject() {
    return $this->obj;
  }

  public function serialize() {
    for ($i = 0; $i < 15; ++$i) {
      var_dump(serialize(new \stdClass()));
    }

    return serialize($this->obj);
  }

  public function unserialize($serialized) {
    $this->obj = unserialize($serialized);
  }
}


$wrapper = new WrapperObject(new ObjectWithReferences());

var_dump($wrapper->getObject());

$serialized = serialize($wrapper);
var_dump($serialized);

$wrapper = unserialize($serialized);

=============================================

Output:
string(19) "O:8:"stdClass":0:{}"
string(19) "O:8:"stdClass":0:{}"
string(19) "O:8:"stdClass":0:{}"
string(4) "r:2;"
string(4) "r:3;"
string(4) "r:4;"
string(4) "r:2;"
string(4) "r:3;"
string(4) "r:4;"
string(4) "r:2;"
string(4) "r:3;"
string(4) "r:4;"
string(4) "r:2;"
string(4) "r:3;"
string(4) "r:4;"
string(110) "C:13:"WrapperObject":84:{O:20:"ObjectWithReferences":2:{s:7:"*var1";O:8:"stdClass":0:{}s:7:"*var2";r:18;}}"

Notice: unserialize(): Error at offset 83 of 84 bytes in /in/XaTAq on line 34

==================================================

The first three iterations make new stdClass instances, and so far we're fine. After that, it looks like the garbage collection gets involved, starts wiping the unused instances and freeing the memory for the new instances. As serialization continues, the new instances get put in blocks once allocated to the now-garbage-collected instances which causes the serialization function to think it's getting a reference to something it's already serialized (!). This repeats in blocks of three as the garbage collection is invoked and starts blasting more dead objects.

Then we get to the actual object we're serializing. At this point, PHP thinks it has serialized 15 more objects (!) (should only be three new references, but all as part of a separate operation), so the reference counts are completely hosed. As soon as this is unserialized, things blow up (as evident by the error at the end).

The solution here requires improving this lazy reference counting/checking code. The design appears to have three major issues which are going to continue causing problems for anyone using any of the serialization hooks (Serializable interface or the __sleep/__wakeup magic methods):

(1) Blindly using memory locations to determine whether or not an object has already been serialized is going to lead to false positives as long as garbage collection is enabled during the operation.

(2) Not storing an object ID in the serialized object format forces relative referencing. Having absolute referencing may not fix all of the problems here, but it'd certainly be able fail more gracefully (malformed input error vs silently rebuilding the object with a corrupted state).

(3) In some cases, a nested serialize calls are are not given a unique state, which makes them almost-recursive calls except their output is not guaranteed to be part of the parent call's output.


So long as these issues are left unresolved, any classes using the serialization hooks are going to be incredibly brittle. I shudder when I think of what problems subclassing adds to this mess (nevermind delegation, event handling and other callback-oriented patterns).
 [2017-01-01 12:30 UTC] nikic@php.net
Note that the issue in the last comment is due to bug #66085, which is resolved in PHP 7. Of course this still leaves the overall problem.
 [2017-01-07 23:07 UTC] php at laszlokorte dot de
I guess I came across the same issue.
The following gist reproduces it even in php7.0.14:

https://gist.github.com/laszlokorte/3948f40873346cc1fd9b8c11ab06ae04
 [2017-02-17 14:03 UTC] hwold at hwold dot net
Still present in PHP 7.1.2. Another test case : https://3v4l.org/BXuv8

Code:
-----
    <?php
    class User {
        public $name = "admin";
    }

    class UserCouple implements Serializable {
        public $user1;
        public $user2;

        public function serialize() {
            return serialize(array(42, serialize(array($this->user1, $this->user2))));
        }

        public function unserialize($serialized) {
            list($_, $subSerialized) = unserialize($serialized);
            list($this->user1, $this->user2) = unserialize($subSerialized);
        }
    }

    $user = new User();
    $couple = new UserCouple();
    $couple->user1 = $user;
    $couple->user2 = $user;

    var_dump(unserialize(serialize($couple)));

Expected result
---------------
    object(UserCouple)#3 (2) {
      ["user1"]=>
      object(User)#4 (1) {
        ["name"]=>
        string(5) "admin"
      }
      ["user2"]=>
      object(User)#4 (1) {
        ["name"]=>
        string(5) "admin"
      }
    }

Actual result
-------------

    object(UserCouple)#3 (2) {
      ["user1"]=>
      object(User)#4 (1) {
        ["name"]=>
        string(5) "admin"
      }
      ["user2"]=>
      int(42)
    }
 [2021-04-27 15:36 UTC] neclimdul at gmail dot com
Is safe to say Serializable is just broken in this respect and replaced by the new custom object serialization methods in 7.4+? Maybe with a follow up to deprecating and adding warnings around this breakage?
 [2021-04-27 15:41 UTC] nikic@php.net
-Status: Open +Status: Wont fix
 [2021-04-27 15:41 UTC] nikic@php.net
Yes, that's correct. The Serializable interface is not salvageable and has been replaced by __serialize() and __unserialize(). It will be (partially) deprecated in PHP 8.1 (https://wiki.php.net/rfc/phase_out_serializable).
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 18:01:29 2024 UTC