Profit from scraper lfu-rlp and integrate further federal institutions of Rheinland-Pfalz
websites of RLP similar structured as lfu.rlp.de:
- snu.rlp.de
- wegezumholz.rlp.de
- umdenken.rlp.de
- naturschutzstationen.rlp.de
- effnet.rlp.de
- badeseen.rlp.de --> LfU
- umgebungslaerm.rlp.de --> LfU
- luft.rlp.de --> LfU
- mkuem.rlp.de
- lua.rlp.de
- wald.rlp.de
- klimaneutrales.rlp.de
- klimawandel.rlp.de
The website bildung.rlp.de was not included, as environment and nature conservation was not really represented in the sitemap.
Different minor things have to be solved:
-
Some title contain to much information ("Alle Badeseen", "Aktuelle Projekte SNU") -
Some title on Detailpage are more informative than the title in the sitemap (e.g. Fischotter [SNU]) -
More pages than available are scraped for category "Aktuelles" -
SNU has empty pages in "Links und Downloads" -
Track PDF error as unknown log message
Another task, that has to be done separately:
- fawf.wald.rlp.de (here merges and aggregations are necessary and should be handled in a respective scraper)
Edited by Stefan Krämer