Scraper für KORINA
www.korina.info is the website of the Koordinationsstelle invasive Neophyten in Schutzgebieten Sachsen-Anhalts, which is part of the Unabhängige Institut für Umweltfragen e. V.
From the site, we need to scape the metadata behind these elements:
- Artenpoträts (with search function: https://www.korina.info/arten/): Here, a dataset has to be created for each species (Arten) available. Let's use https://www.korina.info/arten/eschen-ahorn/. The Title is 'Ahorn, Eschen- Acer negundo - Synonym(e): Eschenblättriger Ahorn, box elder'. The description would be the text not leading to links, ie., in this case the text of 'Lebensräume' and 'Problematische Vorkommen'. The resources are all links in the detail page, ie., items like 'Schwarze Liste Managementliste Sachsen-Anhalt:', 'Verbreitungskarte', 'Bestimmungshilfe', 'Steckbriefe/Factsheets' and 'Informationen zur Eschen-Ahorn-Ringelung' but also resources like links to all the images (both the one of the right and in the list below) and all the literature links (Literatur). In addition, two more resources have to be created. First, Bestimmungshilfen (pdf) (https://www.korina.info/funde/bestimmungshilfen/) is a list of PDFs that provide additional material for some (but not all) of the species already covered above. The example of 'eschen-ahorn' has a 'Bestimmungshilfe' (identification sheet) at https://www.korina.info/wp-content/uploads/2013/03/Acer%20negundo%20Bestimmungshilfe%20KORINAx%20.pdf. These PDFs have to be added as resources to the main species entry. Second, interactive atlas (https://www.korina.info/funde/atlas/) provides a access to a shape file for the occurance of each species. For the example of 'eschen-ahorn', the occurance can be accessed by selecting 'Ahorn, -Eschen' in the pull down menu. The resulting shape file should be added as a resource to the species entry.
- Educational material (https://www.korina.info/bildung/materialien/): Here, a dataset has to be created for ech entry. Let's use 'Methodenheft Klasse 5-9'. Here, the title would be 'Methodenheft Klasse 5-9', the description would be the text on the detail page, the resources would be the links to either the PDFs or the to the owncloud folders.
In addition, the following entry has to be added to deployment/harvester.toml
:
[[sources]]
name = "korina-ufu"
type = "korina-ufu.py"
url = "https://www.korina.info"
provenance = "/Zivilgesellschaft/UfU/KORINA"
also, the following entry has to be added to deployment/provenances.toml
:
["Zivilgesellschaft/UfU/KORINA"]
name = "korina"
about_url = "https://www.korina.info/info/korina/"
contact_url = "https://www.korina.info/kontakt/"
email = kontakt@korina.info"
description = "KORINA ist Koordinationsstelle des Unabhängigen Institutes für Umweltfragen e. V.Es stellt Informationen zu invasiven Neophyten in den Schutzgebieten Sachsen-Anhalts bereit."
Edited by Falk Heße