Über Open CoDE Software Wiki Diskussionen GitLab

Skip to content

Implement harvester regression tests for local and CI testing

Adam Reichold requested to merge regression-test into main

This adds an xtask called regression-test which will run the harvest using a specifically checked-in configuration and stored responses to identify whether the resulting datasets changed and printing modifications as an easy to read diff, e.g. the change

diff --git a/src/harvester/csw.rs b/src/harvester/csw.rs
index 75777a6..bf31b83 100644
--- a/src/harvester/csw.rs
+++ b/src/harvester/csw.rs
@@ -128,7 +128,7 @@ pub async fn translate_dataset(
         }
     }
 
-    let language = identification
+    let _language: crate::dataset::Language = identification
         .languages
         .first()
         .and_then(|language| language.code)
@@ -154,7 +154,6 @@ pub async fn translate_dataset(
         source_url: source.source_url().replace("{{id}}", identifier),
         resources,
         issued,
-        language,
         tags,
         ..Default::default()
     };

will yield an output like

$ cargo xtask regression-test
   Compiling metadaten v0.1.0 (/home/ubuntu/metadaten)
    Finished dev-opt [optimized + debuginfo] target(s) in 13.93s
     Running `target/dev-opt/harvester`
2023-01-19T16:33:55.283935Z  INFO harvester: Harvesting 1 sources
2023-01-19T16:33:55.325318Z  INFO harvest{source=Source { name: "uba-gdi", type: Csw, url: "https://gis.uba.de/smartfinder-csw/api", provenance: "/Bund/UBA/GDI", filter: None, source_url: Some("https://gis.uba.de/smartfinder-client/?lang=de#/datasets/iso/{{id}}"), concurrency: 1, batch_size: 100 }}: metadaten::harvester::csw: Harvesting 180 datasets
--- regression-test/datasets.old/uba-gdi/1b65bb0d-0085-41d3-97cc-57427b5f1742.json      2023-01-19 16:33:55.365529252 +0000
+++ regression-test/datasets/uba-gdi/1b65bb0d-0085-41d3-97cc-57427b5f1742.json  2023-01-19 16:33:55.365529252 +0000
@@ -54,5 +54,5 @@
       "url": "https://gis.uba.de/maps/resources/apps/lu_umweltzonen"
     }
   ],
-  "language": "German"
+  "language": "Unknown"
 }
--- regression-test/datasets.old/uba-gdi/229ed9cb-c817-46f9-91f0-f6337148ea19.json      2023-01-19 16:33:55.373529314 +0000
+++ regression-test/datasets/uba-gdi/229ed9cb-c817-46f9-91f0-f6337148ea19.json  2023-01-19 16:33:55.369529283 +0000
@@ -63,5 +63,5 @@
       "url": "https://gis.uba.de/website/web/moos/index.html"
     }
   ],
-  "language": "German"
+  "language": "Unknown"
 }


[..]


--- regression-test/datasets.old/uba-gdi/e5e66abd-f8d5-4284-8264-2f8279a3b175.json      2023-01-19 16:33:55.433529777 +0000
+++ regression-test/datasets/uba-gdi/e5e66abd-f8d5-4284-8264-2f8279a3b175.json  2023-01-19 16:33:55.429529746 +0000
@@ -57,5 +57,5 @@
       "url": "https://www.umweltbundesamt.de/europaeische-mobilitaetswoche-aktionen-2022"
     }
   ],
-  "language": "German"
+  "language": "Unknown"
 }
--- regression-test/datasets.old/uba-gdi/f693f3bc-b13b-44c2-8d55-e34636daf48c.json      2023-01-19 16:33:55.437529808 +0000
+++ regression-test/datasets/uba-gdi/f693f3bc-b13b-44c2-8d55-e34636daf48c.json  2023-01-19 16:33:55.437529808 +0000
@@ -60,5 +60,5 @@
       "url": "https://gis.uba.de/website/luft/index.html"
     }
   ],
-  "language": "German"
+  "language": "Unknown"
 }
Error: "15 datasets were modified, 0 were removed and 0 were added"

Still to be done before this can be merged:

  • The resulting datasets are currently not fully deterministic as the HashSet we are using for Dataset::tags is randomized, so we probably need either a different data structure or a deterministic hash function.
  • The stored responses and datasets are large and should be handled using Git LFS which must be enabled here and setup when baking development VM images.
  • We need a more diverse set of stored responses resp. data source to achieve a reasonable coverageof our harvesters.
  • While using the tests is a single command, confirming intentional changes should be done by committing the modified regression-tests/datasets folder to Git which is probably not obvious without documentation.

Closes #179 (closed)

Edited by Adam Reichold

Merge request reports