Über Open CoDE Software Wiki Diskussionen Gitlab

Skip to content

Consider modified date when deduplicating datasets as a first step towards merging metadata

Adam Reichold requested to merge OC000014987132/metadaten:merge-by-modified into main

Based on !513 (merged) Rebased directly onto main because this change does not really require the improvements to the global identifier type proposed in !513 (merged).

@OC000008373193 The last but one commit is an idea I just had which makes it almost trivial to implement consideration of the modified date for deduplication: Since a Date is rather small at just 4 bytes, we can keep it in memory in the fingerprints table and consider it for deduplication without additional disk I/O.

This does not really help when we want to merge metadata later on which will require disk I/O (so that merge_duplicates will have to become async and parallelize these accesses etc.), it should make it really easy to finish the current story and assess the effects of this policy with very little code changes (+16-6).

Edited by Adam Reichold

Merge request reports