Profit from scraper lfu-rlp and integrate further federal institutions of Rheinland-Pfalz

websites of RLP similar structured as lfu.rlp.de:

  • snu.rlp.de
  • wegezumholz.rlp.de
  • umdenken.rlp.de
  • naturschutzstationen.rlp.de
  • effnet.rlp.de
  • badeseen.rlp.de --> LfU
  • umgebungslaerm.rlp.de --> LfU
  • luft.rlp.de --> LfU
  • mkuem.rlp.de
  • lua.rlp.de
  • wald.rlp.de
  • klimaneutrales.rlp.de
  • klimawandel.rlp.de

The website bildung.rlp.de was not included, as environment and nature conservation was not really represented in the sitemap.

Different minor things have to be solved:

  • Some title contain to much information ("Alle Badeseen", "Aktuelle Projekte SNU")
  • Some title on Detailpage are more informative than the title in the sitemap (e.g. Fischotter [SNU])
  • More pages than available are scraped for category "Aktuelles"
  • SNU has empty pages in "Links und Downloads"
  • Track PDF error as unknown log message

Another task, that has to be done separately:

  • fawf.wald.rlp.de (here merges and aggregations are necessary and should be handled in a respective scraper)
Edited by Stefan Krämer

Merge request reports

Loading