Adding additional well-known global identifiers
For this MR I started to go one-by-one through our havester.toml checking if there are identifiers that will stay constant if the source is harvested by another entity that might be collected later by us again. Also I tried to find UUIDs that are already hidden in the index to this date, e.g. as unidentified part of the source_url.
I found the following:
- Doris from BfS provides URN:NBN identifiers that are now added in the form of known global identifiers
- GeoSeaMap has a different form of global identifier but uses UUIDs in the end.
- website-bfn (aka /Bund/BfN/Publikationen) provides DOI
- All oai harvesters will preferentially use URN:NBN identifiers if available, with a fall-back to DOI.
As a side effect, this MR collects all global_identifiers that provide a UUID and makes sure that when comparing global_identifiers only the UUID part is compared while still keeping the provider for later reference.
There are still plenty of sources to check, but this might as well go into a new MR.
Edited by Jakob Deller