Use Rayon to load datasets for indexing using multiple threads.
Created by: adamreichold
Let's do this after all as this reduces the time window in which the index does not consistently capture the datasets which might lead to broken links from the search results into the datasets.
Performance does improve measurably:
strategy | caches | wall time | CPU utilization |
---|---|---|---|
serial | cold | 5min | 20% |
serial | hot | 10s | 250% |
parallel | cold | 1min | 100% |
parallel | hot | 5s | 350% |
This is on a CODE-DE VM with 4 cores and 380k datasets. Caches were dropped via echo 3 > /proc/sys/vm/drop_caches
.