Über Open CoDE Software Wiki Diskussionen GitLab

Skip to content

Adding additional well-known global identifiers

Jakob Deller requested to merge add_urn_nbn_for_doris into main

For this MR I started to go one-by-one through our havester.toml checking if there are identifiers that will stay constant if the source is harvested by another entity that might be collected later by us again. Also I tried to find UUIDs that are already hidden in the index to this date, e.g. as unidentified part of the source_url.

I found the following:

  • Doris from BfS provides URN:NBN identifiers that are now added in the form of known global identifiers
  • GeoSeaMap has a different form of global identifier but uses UUIDs in the end.
  • website-bfn (aka /Bund/BfN/Publikationen) provides DOI
  • All oai harvesters will preferentially use URN:NBN identifiers if available, with a fall-back to DOI.

As a side effect, this MR collects all global_identifiers that provide a UUID and makes sure that when comparing global_identifiers only the UUID part is compared while still keeping the provider for later reference.

There are still plenty of sources to check, but this might as well go into a new MR.

Edited by Jakob Deller

Merge request reports