Drop fast fields backing secondary indexes (!496) · Merge requests · umwelt-info / metadaten

Adam Reichold requested to merge drop-fast-fields-secondary-indexes into main Feb 07, 2024

That we serialize bounding boxes and time ranges as fast fields is purely an artifact of our initial implementation when we just filtered the results of the full-text search using the bounding boxes which requires a fast field for sufficient performance. Now that we actually use secondary indexes (R* trees, intervall trees) which are queried directly, the fast fields are unnecessary overhead which bloat the index even though they are only read once when the index is loaded upon completed harvesting or server restart.

Hence, this change drops both fast fields and reads the bounding boxes and time ranges by loading the full datasets from the document store (just as we do when fulfilling queries via the API). This does make creation of the secondary indexes slower, but it also makes creation of the primary index faster and the resulting index on disk smaller:

	before	after
index bounding boxes	50 ms	1.7 s
index time ranges	10 ms	1.7 s
primary index creation	40 s	35 s
primary index size	294 MB	286 MB

Drop fast fields backing secondary indexes

Merge request reports