php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #48100 Make serialize()d objects shorter by omitting default values
Submitted: 2009-04-28 13:51 UTC Modified: 2016-01-15 14:49 UTC
Votes:5
Avg. Score:4.8 ± 0.4
Reproduced:4 of 4 (100.0%)
Same Version:4 (100.0%)
Same OS:2 (50.0%)
From: php at prog dot hu Assigned:
Status: Not a bug Package: *General Issues
PHP Version: 5.3.0RC1 OS:
Private report: No CVE-ID: None
 [2009-04-28 13:51 UTC] php at prog dot hu
Description:
------------
This entry is about a feature request for the extension of the serialize() function, to optinally enable applications to store their persisent objects in a shorter, more efficient way than now.

Currently serialize() stores all member fields of the objects in the output string, even if most of those member fields still have their default values as defined in their appropriate class' defintion. This is a waste of storage space and processing power at re-parse time (deserialization).

Applications therefore should have the option to tell PHP to store only those member fields in serialized object descriptions, which have non-default values. This could be achieved by adding a second, optional parameter to serialize(), which (if set to true for ex) would invoke this "optimized" behaviour. Reloading (deserialization)of the objects stored in this "optimized" format should occour the same way as now (with a regular unserialize()).

The modification I'm proposing has practically no compatibility impacts, as serialize() currently has no second parameter, so old code would still emit full object defintions (with all member fields included, even those with default values), and provide byte-to-byte the same output as in older version. Also unserialization would require no code change, since even in current PHP implementations, if a member field is not included in the string passed to unserialize(), the value of the field in the returned object will default to the value set forth in the class defintion field (if any). That means that the same unserialize() call could take both the old "full" and the new "optimized" strings, and would construct exactly the same object from them, provided the member fields' definitions in the class file wasn't changed in regard to the default values of those field (that's why we need to make the change optional, so applications developers can meet their choice whether they prefer speed/storage space benefits over strict compatibility, especially, that the latter one can be taken care of from application code, too).


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-04-28 14:01 UTC] php at prog dot hu
I forgot to mention it, but of course member fields with no default values set forth in the class definition, but an actual value of NULL could be handled the same way (eg. omitted from output) in the "optimized" serialization mode, as member fields default to that value anyway after construction, so unserialize() would still produce the same object from the "full" and "optimized" strings.
 [2015-06-29 07:22 UTC] lucas at threeamdesign dot com dot au
This would create data corruption issues because one cannot see into the future. A code change will break compatibility with objects serialized before the change. Your proposal admits the choice to omit these defaults is done at the time of serialization, so there's no way to 'un-choose' having incorrect data in the future.
 [2016-01-15 14:49 UTC] danack@php.net
-Status: Open +Status: Not a bug -Package: Feature/Change Request +Package: *General Issues
 [2016-01-15 14:49 UTC] danack@php.net
This is correct:

"This would create data corruption issues because one cannot see into the future. A code change will break compatibility with objects serialized before the change.;;;so there's no way to 'un-choose' having incorrect data in the future."

There is no way this request could be made to work safely.
 [2016-01-16 01:37 UTC] php at prog dot hu
It would not create any data corruption issues. 

First, because the compact version would be emitted only, if the developer choses explicitly to do so by specifying the second, optional parameter. Now, if the developer decides explicitly to do so, then he obviously knows what he's doing and knows what the implications of his actions are. 

Second, because if one changes the default value of a property, that already implies that any code that depended on that property having a specific value gets "broken" - regardless of any serialization of the object taking place or not. So obviously, if the object gets serialized and then deserialized, then that doesn't change that fact any more, or doesn't make it any worse. 

Third thing: saving all property values doesn't shield your code from breaking changes any more, than does omitting default values does. Why? Because even though you have all properties saved in a serialized object, you don't have the code manipulating, using and querying those properties (ie. the code in the class methods, and the code called by the latter, and so on) saved in it. So when you load back (deserialize) an object, it might very well have a different code base (methods, functions, etc) attached to it, than it had when it was saved. And if those changes have not been backward compatible or have not been tested thoroughly or are not capable of handling unserialized property data sets that were serialized with previous versions of the same object/class (with a different code set behind them), then they will break your program, too.

So, all in all: no, the proposed feature request does not introduce any kind of "data corruption issue" that could not be introduced by the features already present in the language. It would, however, enable some serious savings in data storage requirements and run-time speed advantages when handling large data sets with many properties left at their default values.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 13:01:31 2024 UTC