Port ELWIS harvester to Rust to simplify future schema changes.
This is a verbatim port keeping all the TODO
markers "intact", i.e. the metadata quality is the same as before. I did this because I would prefer to get rid of the Python infrastructure and iterate on the Rust code base. In addition, indexing PDF seems to be broken in ELWIS itself for now and hence I could not have tested this even if I did implement it.
Note that this can currently only be run with the CHECK_SOURCE_URL
part in xtask/src/main.rs
commented out because the Python code produced what are technically invalid URL (missing percent encoding of characeters like +
) and I did not want to reproduce that behaviour.
When this is done, the locales setup code here and here can be removed, c.f. umwelt-info/infrastruktur/entwicklung!58 (merged) and umwelt-info/infrastruktur/testbetrieb!74 (merged).