Über Open CoDE Software Wiki Diskussionen Gitlab

Skip to content

RFC: Migrate dataset serialization to bitcode as a test case

bitcode is especially effective when coupling with Zstd compression as used by our index and hence appears to be a fitting replacement for bincode which does not really target Serde any longer.

To test it where it counts, this adds a schema migration which does not really change the schema but rather changes the serialization format used for datasets from bincode to bitcode.

While we do still use bincode in various places after this change, it should provide a stress test for the format and make any efficiency gains measurable, e.g. a decrease of the size of Tantivy's document store. [1]

If deemed a good migration target indeed, all other usages of bincode can be similarly migrated, adding schema versioning where currently still missing, e.g. for the AutoClassify results.

Start of a resolution of #174

[1] https://github.com/djkoloski/rust_serialization_benchmark

Edited by Adam Reichold

Merge request reports