Über Open CoDE Software Wiki Diskussionen GitLab

Skip to content

Use Rayon to load datasets for indexing using multiple threads.

Adam Reichold requested to merge parallel-indexer into main

Created by: adamreichold

Let's do this after all as this reduces the time window in which the index does not consistently capture the datasets which might lead to broken links from the search results into the datasets.

Performance does improve measurably:

strategy caches wall time CPU utilization
serial cold 5min 20%
serial hot 10s 250%
parallel cold 1min 100%
parallel hot 5s 350%

This is on a CODE-DE VM with 4 cores and 380k datasets. Caches were dropped via echo 3 > /proc/sys/vm/drop_caches.

Merge request reports

Loading