Drop fast fields backing secondary indexes
That we serialize bounding boxes and time ranges as fast fields is purely an artifact of our initial implementation when we just filtered the results of the full-text search using the bounding boxes which requires a fast field for sufficient performance. Now that we actually use secondary indexes (R* trees, intervall trees) which are queried directly, the fast fields are unnecessary overhead which bloat the index even though they are only read once when the index is loaded upon completed harvesting or server restart.
Hence, this change drops both fast fields and reads the bounding boxes and time ranges by loading the full datasets from the document store (just as we do when fulfilling queries via the API). This does make creation of the secondary indexes slower, but it also makes creation of the primary index faster and the resulting index on disk smaller:
before | after | |
---|---|---|
index bounding boxes | 50 ms | 1.7 s |
index time ranges | 10 ms | 1.7 s |
primary index creation | 40 s | 35 s |
primary index size | 294 MB | 286 MB |