Consider modified date when deduplicating datasets as a first step towards merging metadata
Based on !513 (merged) Rebased directly onto main
because this change does not really require the improvements to the global identifier type proposed in !513 (merged).
@OC000008373193 The last but one commit is an idea I just had which makes it almost trivial to implement consideration of the modified date for deduplication: Since a Date
is rather small at just 4 bytes, we can keep it in memory in the fingerprints table and consider it for deduplication without additional disk I/O.
This does not really help when we want to merge metadata later on which will require disk I/O (so that merge_duplicates
will have to become async and parallelize these accesses etc.), it should make it really easy to finish the current story and assess the effects of this policy with very little code changes (+16-6).