Über Open CoDE Software Wiki Diskussionen Gitlab

Skip to content

Port the Umweltinformationsprogramme UIP UBA harvester to get rid of external harvesters

Johannes Vogel requested to merge uip-uba into main

The UIP harvester was added in !345 (merged) as external harvester written in python.

To be able to remove the external harvesters altogether, it has to be ported to rust.

The following functions have to be ported:

  • process_about_us_categories
  • process_events
  • process_news
  • process_funding_priorities
  • process_downloadable_files
  • process_funding_faqs
  • process_funding_info_categories
  • process_search_results

There are now 318 datasets compared to 305 before, e.g. the page Großtechnische Umsetzung Bioökonomie seemed to have been missing.

The included Notebook (analysis/notebooks/uip_vergleich.ipynb) enables a direct comparison of the dataset currently on md.umwelt.info and the ones on the local server after running the committed harvester.

Edited by Jakob Deller

Merge request reports