Über Open CoDE Software Wiki Diskussionen GitLab

Skip to content

Drop fast fields backing secondary indexes

Adam Reichold requested to merge drop-fast-fields-secondary-indexes into main

That we serialize bounding boxes and time ranges as fast fields is purely an artifact of our initial implementation when we just filtered the results of the full-text search using the bounding boxes which requires a fast field for sufficient performance. Now that we actually use secondary indexes (R* trees, intervall trees) which are queried directly, the fast fields are unnecessary overhead which bloat the index even though they are only read once when the index is loaded upon completed harvesting or server restart.

Hence, this change drops both fast fields and reads the bounding boxes and time ranges by loading the full datasets from the document store (just as we do when fulfilling queries via the API). This does make creation of the secondary indexes slower, but it also makes creation of the primary index faster and the resulting index on disk smaller:

before after
index bounding boxes 50 ms 1.7 s
index time ranges 10 ms 1.7 s
primary index creation 40 s 35 s
primary index size 294 MB 286 MB

Merge request reports