Über Open CoDE Software Wiki Diskussionen GitLab

Skip to content

adding pressemittelung and geodatenbank to suche_st

Falk Heße requested to merge suche_st into main

This branch adds two subportals of the Suche-ST to the such_-_st.rs scraper, namely the Geodatanbank and the Pressemitteilungen. To that end, two new sources have been added to the harvester.toml, that contain information on the portal being searched and the search terms to be used. As of yet, only the term 'umwelt' is being used for the search but that can be easily expanded. In addition, two new functions have been added to the search_st.rs, namely 'fetch_geo_article' and 'fetch_presse_article' to scrap the metadata from the geodata and presse portal, respectivly.

Both functions still cause some errors while harvesting, which is why the MR is a draft. Please also note that only PDF's and websites are scraped from the geodata portal since all other search results link to the metaver website of Sachsen Anhalt, which should already be scraped by the respective metaver scraper. Please also note that all Pressemitteilungen do contain a link to the PDF version of the webpage, which is not added to the metadate since in all cases that I checked it never contained any information not already provided on the webpage itself.

Edited by Falk Heße

Merge request reports