Scraper for Thru.de
The online platform Thru.de, a service provided by the German Environment Agency, publishes a comprehensive list of German waste (co-)incineration plants with a nominal capacity of less than two tonnes per hour under the Industrial Emissions Directive EU/2010/75). More specifically, this includes information on plant name and address, permitted capacity, the type of plant, and the primary type of waste (co-)incinerated (the latter two may be missing in parts).
The search index containing this list, a key component, is to be harvested as one dataset per entry (i. e. per plant).
Downloadable documents (e. g. PDF reports) need to be referenced as individual datasets with attached download links. There is a designated download section that should be harvested as one dataset per entry.
Some data is also presented in the form of maps and requires manual metadata registration.
Note that any references to the German Environment Agency's main website (e. g. the two links to "Emissionsberichterstattung von Deutschland" and "Daten zur Umwelt" given at the bottom of this page) can be ignored, since they are being treated as separate issues.
Acceptance criteria:
- Scraper for Thru.de website merged into main branch and deployed to
md.umwelt.info
. - One dataset for each entry of the sitemap.
- One dataset for each entry of the main search index.
- One dataset for each downloadable document.
- One dataset for each map layer.