-
Notifications
You must be signed in to change notification settings - Fork 819
Description
Hi everyone!
In this issue I want to share with community some modifications I've made to existing Cereal code base (develop branch to be precise) and I think some of you can find useful.
First of all, the link to the fork: https://github.com/uentity/cereal (branch develop-fix).
Second, why it's not contributed to upstream Cereal. Reason is twofold.
- Seems like currently Cereal isn't actively developed. I have two push requests, one (the most trivial one-liner) is accepted. The other one Remove template parameter
ArchivefromInputBindingsMapandOutputBindingsMap#521 makes slightly deeper changes and is floating around with unknown future. BTW this PR is included in my cloned repo. - Some changes I made are not directly related to the main aim of Cereal project: represent C++ object as a 'stream' with strictly sequential write & read access. Instead, these changes help to independently access parts of such stream (considering serialization of polymorphic types).
- Minor reason is that I used C++17 features (Cereal claims only C++11). But you can take an idea and downgrade code to C++11.
Now I'll briefly describe what my changes are about.
Out-of-order serialization of polymorphic types via shared_ptr.
Actually there are some bits of out-of-order serialization support for text archives (described in documentation). But it will most probably fail if you try to do it with polymorphic types. The reason is that Cereal enumerates polymorphic type names, class versions and objects addressed by shared pointers. By 'enumerating' I mean that information (type name, class version) is emitted only when first instance of object of certain type is serialized. For all other instances type name & class version are omitted. This helps reduce archive size, but if you later will try to read arbitrary shared ptr from an archive, you will fail, because you don't know the type (only serial number that depends on objects serialization order), don't know class version (if used) and also object's content can be missing because it's serialized earlier in the stream.
In my fork of Cereal you can remedy first two issues by specifying the following.
a) constexpr auto always_emit_polymorphic_name = true in output archive will do what name said and force Cereal to produce type names for every encountered shared_ptr. This change is backward compatible, so you can read produced archive with upstream Cereal without any modifications.
b) constexpr auto always_emit_class_version = true in both input & output archives. Forces Cereal to emit & read class version for every serialized shared_ptr. This is format-breaking change, produced archives will fail to load with with upstream Cereal.
Next, a bunch of work is done to help correctly out-of-order read shared pointers with pointees serialized earlier. Cereal keeps track of shared pointers and if pointee was already written, it will only produce a serial number instead of object content (and this is right).
To solve an issue an undocumented Cereal feature of 'deferred' objects serialization is used. Idea is that we can keep track of all unknown serial numbers until full archive is read and these numbers become known (address valid object). After that one just invoke archive::serializeDeferments() function that will properly process all deferred shared pointers.
In order to implement this feature I've added new Functor data type to Cereal. When serialized, it just calls saved callback. Also prologues and epiloges are explicitly omitted for Functor with any Archive type (so, modifications in archives implementation are not needed).
To make deferred shared_ptr deserialization you must change conventional ar(my_ptr) to ar(defer_failed(my_ptr, initializer_cb, invoke_on = PtrInitTrigger::Retry)). Here initializer_cb is an arbitrary functor (function, lambda with capture, etc) that will be invoked with passed valid shared pointer when it is available. If invoke_on == PtrInitTrigger::SuccessAndRetry, then initiallzer is called if shared_ptr is read immediately or on deferred load (retry). If invoke_on == PtrInitTrigger::Retry then it's triggered only on deferred load (immediate load initializes given shared pointer directly).
Described mechanism will work for all kinds of archives, including bundled with Cereal. Note that you have to call serializeDeferments() on archive after content was read.
Lot of work is done for better runtime support of polymorphic types serialization across multiple shared libraries. Issue is that bindings are stored in global static maps and each shared library will have it's own map. Hence, if in library A you're trying to serialize an object defined in library B (with serialization code), Cereal will fail to do that. To fix it, you must 'unite' bindings from multiple modules. This is what my PR #521 is for. Branch develop-fix from my fork continues changeset and allows for easier 'uniting' polymorphic casts maps (used by Cereal to discover which type can be casted to which).
With all above changes inplace I was able to implement input/output archives pair that internally splits JSON stream into multiple files (depending on object type).