Über Open CoDE Software Wiki Diskussionen Gitlab

Skip to content

Add support for external harvesters

Adam Reichold requested to merge external-harvesters into main

This adds a narrow interface to support external harvesters implemented using a language different from Rust. It works by executing external programs and communicating via standard input and output streams. The source configuration is passed to the child program as a JSON object while datasets are read from the child program using newline-delimited JSON.

The standard error stream is deliberately not captured and hence inherited so that the child program can just perform its own logging by printing to the standard error stream.

While the approach does not depend on the implementation language, we do assume that Python will be our main secondary language and hence apply Black and Flake8 to these external harvesters.

As a first test this implements scrapers for BfN's "Daten und Fakten" as well as "Projektsteckbriefe".

Edited by Adam Reichold

Merge request reports