Scraper for Stadt-Land-Plus
Stadt-Land-Plus is a project within the German Environmental Agency (UBA).
We need to scrape the metadata behind a number of elements.
- 'Projekte': Here a dataset has to be created for each bullet point in the list. Let's explain this using this entry. Here the title would be 'CoAct Integriertes Stadt-Land-Konzept zur Erzeugung von Aktivkohle und Energieträgern aus Restbiom'. The description would be all the text in the paragraphs 'Motivation', 'Ziele und Vorgehen' and 'Erwartete Ergebnisse und Transfer'. 'Projektleitung / Kontakt' would be considered as contact. Resources would be the 'Projektsteckbrief' (pdf), 'Internet' (webpage) and the image (image). Additional data would be the time range 'Laufzeit:' and time issued 'Stand'. All other projects have to be scraped accordingly.
- 'Veranstaltungen': Here each event has to be turned into a dataset. Let's use this event as example. The title would be 'DKG '23 mit NEILA-Session "Flächensparen vs. Wohnungsnot: Hemmnisse & Lösungsansätze auf stadtregionaler Ebene" - Abschlussveranstaltung NEILA'. The description would be the text underneath. The only resource is the link (webpage). The time issued would be '22. September 2023'.
- 'News': Here, each news would be a dataset. Let's use this news item as an example. Here the title is 'Abschlussforum des Verbundvorhabens Interko2 in Neukieritzsch' and the description is the text underneath. Some news items have links which would be a resource (webpage) if present in the entry. Each entry is furthermore associated with a time issued. The datetime is to be derived from this page as issued.
- 'Medientipps': Here only one dataset has to be created. Title is 'Medientipps' and the description is the text underneath. Resources are all the links that are listed. They are either videos (video), podcasts (audio) or apps.
- 'Zahl des Monats': Here each month would be a dataset. Let's use September 2023. The title would be 'Zahl des Monats - September 2023', the description would be the text in the entry, the resources would be possible links and the time issue would be 'September 2023'.
- 'Publikationen': Here a single dataset has to be created. The tile would be 'Stadt-Land-Plus Publikationen', the description would be text underneath the title and the resources (pdf) would be all the links in the page.
In addition, the following entry has to be added to deployment/harvester.toml
:
[[sources]]
name = "stadt-land-plus-uba"
type = "stadt-land-plus-uba.py"
url = "www.zukunftsstadt-stadtlandplus.de"
provenance = "/Bund/UBA/SLP"
also, the following entry has to be added to deployment/provenances.toml
:
["Bund/UBA/SLP"]
name = "Stadt-Land-Plus"
about_url = "https://www.zukunftsstadt-stadtlandplus.de/infos-ueber-stadt-land-plus.html"
contact_url = "https://www.zukunftsstadt-stadtlandplus.de/impressum-datenschutz.html"
email = "kontakt@fona-stadtlandplus.de"
description = "Stadt-Land-Plus ist eine Organisation des Umweltbundesamtes (UBA). Es organisiert und koordiniert Verbundvorhaben des Bundesministeriums für Bildung und Forschung zur nachhaltigen Stärkung der Stadt-Land-Beziehungen."
Edited by Adam Reichold