Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • uba-ki-lab/ubair
1 result
Show changes
Showing
with 632 additions and 57 deletions
# ubair
<!-- README.md is generated from README.Rmd. Please edit that file -->
# ubair <img src="inst/sticker/stickers-ubair-1.png" align="right" width="20%"/>
**ubair** is an R package for Statistical Investigation of the Impact of External Conditions on Air Quality: it uses the statistical software R to analyze and visualize the impact of external factors, such as traffic restrictions, hazards, and political measures, on air quality. It aims to provide experts with a transparent comparison of modeling approaches and to support data-driven evaluations for policy advisory purposes.
## Getting started
## Installation
To make it easy for you to get started with GitLab, here's a list of recommended next steps.
Install via cran or if you have access to <https://gitlab.ai-env.de/use-case-luft/ubair> you can use one of the following options:
Already a pro? Just edit this README.md and make it your own. Want to make it easy? [Use the template at the bottom](#editing-this-readme)!
#### Using an archive file
## Add your files
Recommended if you do not have git installed.
- [ ] [Create](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#create-a-file) or [upload](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#upload-a-file) files
- [ ] [Add files using the command line](https://docs.gitlab.com/ee/gitlab-basics/add-file.html#add-a-file-using-the-command-line) or push an existing Git repository with the following command:
- Download zip/tar.gz from GitLab
- Start a new R-Project or open an existing one
- in R-Studio:
- go to ‘Packages’-Tab (next to Help/Plots/Files)
- Click on ‘Install’ (left upper corner)
- Install from: choose “Package Archive File”
- Browse to zip-file
- ‘Install’
- alternatively, type in console:
``` r
install.packages("<path-to-zip>/ubair-master.zip", repos = NULL, type = "source")
```
cd existing_repo
git remote add origin https://gitlab.opencode.de/uba-ki-lab/ubair.git
git branch -M main
git push -uf origin main
#### Using remote package
Git needs to be installed.
``` r
install.packages("remotes")
# requires a configures ssh-key
remotes::install_git("git@gitlab.ai-env.de:use-case-luft/ubair.git")
# alternative via password
remotes::install_git("https://gitlab.ai-env.de/use-case-luft/ubair.git")
```
## Integrate with your tools
## Sample Usage of package
- [ ] [Set up project integrations](https://gitlab.opencode.de/uba-ki-lab/ubair/-/settings/integrations)
For a more detailed explanation of the package, you can access the vignettes:
## Collaborate with your team
- View user_sample source code directly in the [vignettes/](vignettes/) folder.
- Open vignette by function `vignette("user_sample_1", package = "ubair")`, if the package was installed with vignettes
- [ ] [Invite team members and collaborators](https://docs.gitlab.com/ee/user/project/members/)
- [ ] [Create a new merge request](https://docs.gitlab.com/ee/user/project/merge_requests/creating_merge_requests.html)
- [ ] [Automatically close issues from merge requests](https://docs.gitlab.com/ee/user/project/issues/managing_issues.html#closing-issues-automatically)
- [ ] [Enable merge request approvals](https://docs.gitlab.com/ee/user/project/merge_requests/approvals/)
- [ ] [Set auto-merge](https://docs.gitlab.com/ee/user/project/merge_requests/merge_when_pipeline_succeeds.html)
``` r
library(ubair)
params <- load_params()
env_data <- sample_data_DESN025
```
## Test and Deploy
``` r
# Plot meteo data
plot_station_measurements(env_data, params$meteo_variables)
```
Use the built-in continuous integration in GitLab.
<img src="man/figures/README-plot-meteo-data-1.png" width="100%"/>
- split data into training, reference and effect time intervals <img src="man/figures/time_split_overview.png" width="100%"/>
``` r
application_start <- lubridate::ymd("20191201") # This coincides with the start of the reference window
date_effect_start <- lubridate::ymd_hm("20200323 00:00") # This splits the forecast into reference and effect
application_end <- lubridate::ymd("20200504") # This coincides with the end of the effect window
buffer <- 24 * 14 # 14 days buffer
dt_prepared <- prepare_data_for_modelling(env_data, params)
dt_prepared <- dt_prepared[complete.cases(dt_prepared)]
split_data <- split_data_counterfactual(
dt_prepared, application_start,
application_end
)
res <- run_counterfactual(split_data,
params,
detrending_function = "linear",
model_type = "lightgbm",
alpha = 0.9,
log_transform = TRUE,
calc_shaps = TRUE
)
```
- [ ] [Get started with GitLab CI/CD](https://docs.gitlab.com/ee/ci/quick_start/index.html)
- [ ] [Analyze your code for known vulnerabilities with Static Application Security Testing (SAST)](https://docs.gitlab.com/ee/user/application_security/sast/)
- [ ] [Deploy to Kubernetes, Amazon EC2, or Amazon ECS using Auto Deploy](https://docs.gitlab.com/ee/topics/autodevops/requirements.html)
- [ ] [Use pull-based deployments for improved Kubernetes management](https://docs.gitlab.com/ee/user/clusters/agent/)
- [ ] [Set up protected environments](https://docs.gitlab.com/ee/ci/environments/protected_environments.html)
```
#> [LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.028078 seconds.
#> You can set `force_col_wise=true` to remove the overhead.
#> [LightGBM] [Info] Total Bins 1557
#> [LightGBM] [Info] Number of data points in the train set: 104486, number of used features: 9
#> [LightGBM] [Info] Start training from score -0.000000
```
***
``` r
predictions <- res$prediction
# Editing this README
plot_counterfactual(predictions, params,
window_size = 14,
date_effect_start,
buffer = buffer,
plot_pred_interval = TRUE
)
```
When you're ready to make this README your own, just edit this file and use the handy template below (or feel free to structure it however you want - this is just a starting point!). Thanks to [makeareadme.com](https://www.makeareadme.com/) for this template.
<img src="man/figures/README-counterfactual-scenario-1.png" width="100%"/>
## Suggestions for a good README
``` r
round(calc_performance_metrics(predictions, date_effect_start, buffer = buffer), 2)
```
Every project is different, so consider which of these sections apply to yours. The sections used in the template are suggestions for most open source projects. Also keep in mind that while a README can be too long and detailed, too long is better than too short. If you think your README is too long, consider utilizing another form of documentation rather than cutting out information.
```
#> RMSE MSE MAE MAPE Bias
#> 7.38 54.48 5.38 0.18 -2.73
#> R2 Coverage lower Coverage upper Coverage Correlation
#> 0.74 0.97 0.95 0.92 0.89
#> MFB FGE
#> -0.05 0.19
```
## Name
Choose a self-explaining name for your project.
``` r
round(calc_summary_statistics(predictions, date_effect_start, buffer = buffer), 2)
```
## Description
Let people know what your project can do specifically. Provide context and add a link to any reference visitors might be unfamiliar with. A list of Features or a Background subsection can also be added here. If there are alternatives to your project, this is a good place to list differentiating factors.
::: kable-table
| | true | prediction |
|:---------------------|-------:|-----------:|
| min | 3.36 | 5.58 |
| max | 111.90 | 59.71 |
| var | 212.96 | 128.16 |
| mean | 30.80 | 28.07 |
| 5-percentile | 9.29 | 10.73 |
| 25-percentile | 19.85 | 19.40 |
| median/50-percentile | 29.60 | 27.09 |
| 75-percentile | 40.54 | 36.27 |
| 95-percentile | 56.80 | 47.69 |
:::
``` r
estimate_effect_size(predictions, date_effect_start, buffer = buffer, verbose = TRUE)
```
## Badges
On some READMEs, you may see small images that convey metadata, such as whether or not all the tests are passing for the project. You can use Shields to add some to your README. Many services also have instructions for adding a badge.
```
#> The external effect changed the target value on average by -6.294 compared to the reference time window. This is a -26.37% relative change.
## Visuals
Depending on what you are making, it can be a good idea to include screenshots or even a video (you'll frequently see GIFs rather than actual videos). Tools like ttygif can help, but check out Asciinema for a more sophisticated method.
#> $absolute_effect
#> [1] -6.294028
#>
#> $relative_effect
#> [1] -0.2637
```
## Installation
Within a particular ecosystem, there may be a common way of installing things, such as using Yarn, NuGet, or Homebrew. However, consider the possibility that whoever is reading your README is a novice and would like more guidance. Listing specific steps helps remove ambiguity and gets people to using your project as quickly as possible. If it only runs in a specific context like a particular programming language version or operating system or has dependencies that have to be installed manually, also add a Requirements subsection.
### SHAP feature importances
``` r
shapviz::sv_importance(res$importance, kind = "bee")
```
<img src="man/figures/README-feature_importance-1.png" width="100%"/>
``` r
xvars <- c("TMP", "WIG", "GLO", "WIR")
shapviz::sv_dependence(res$importance, v = xvars)
```
<img src="man/figures/README-feature_importance-2.png" width="100%"/>
## Usage
Use examples liberally, and show the expected output if you can. It's helpful to have inline the smallest example of usage that you can demonstrate, while providing links to more sophisticated examples if they are too long to reasonably include in the README.
## Development
## Support
Tell people where they can go to for help. It can be any combination of an issue tracker, a chat room, an email address, etc.
### Prerequisites
## Roadmap
If you have ideas for releases in the future, it is a good idea to list them in the README.
1. **R**: Make sure you have R installed (recommended version 4.4.1). You can download it from [CRAN](https://cran.r-project.org/).
2. **RStudio** (optional but recommended): Download from [RStudio](https://www.rstudio.com/).
## Contributing
State if you are open to contributions and what your requirements are for accepting them.
### Setting Up the Environment
For people who want to make changes to your project, it's helpful to have some documentation on how to get started. Perhaps there is a script that they should run or some environment variables that they need to set. Make these steps explicit. These instructions could also be useful to your future self.
Install the development version of ubair:
``` r
install.packages("renv")
renv::restore()
devtools::build()
devtools::load_all()
```
### Development
#### Install pre-commit hook (required to ensure tidyverse code formatting)
```
pip install pre-commit
```
#### Add new requirements
If you add new dependencies to *ubair* package, make sure to update the renv.lock file:
``` r
renv::snapshot()
```
#### style and documentation
Before you commit your changes update documentation, ensure style complies with tidyverse styleguide and all tests run without error
``` r
# update documentation and check package integrity
devtools::check()
# apply tidyverse style (also applied as precommit hook)
usethis::use_tidy_style()
# you can check for existing lintr warnings by
devtools::lint()
# run tests
devtools::test()
# build README.md if any changes have been made to README.Rmd
devtools::build_readme()
```
#### Pre-commit hook
in .pre-commit-hook.yaml pre-commit rules are defined and applied before each commmit. This includes: split - run styler to format code in tidyverse style - run roxygen to update doc - check if readme is up to date - run lintr to finally check code style format
If precommit fails, check the automatically applied changes, stage them and retry to commit.
#### Test Coverage
Install covr to run this.
``` r
cov <- covr::package_coverage(type = "all")
cov_list <- covr::coverage_to_list(cov)
data.table::data.table(
part = c("Total", names(cov_list$filecoverage)),
coverage = c(cov_list$totalcoverage, as.vector(cov_list$filecoverage))
)
```
``` r
covr::report(cov)
```
You can also document commands to lint the code or run tests. These steps help to ensure high code quality and reduce the likelihood that the changes inadvertently break something. Having instructions for running tests is especially helpful if it requires external setup, such as starting a Selenium server for testing in a browser.
## Contacts
## Authors and acknowledgment
Show your appreciation to those who have contributed to the project.
**Jore Noa Averbeck** [JoreNoa.Averbeck\@uba.de](mailto:JoreNoa.Averbeck@uba.de){.email}
## License
For open source projects, say how it is licensed.
**Raphael Franke** [Raphael.Franke\@uba.de](mailto:Raphael.Franke@uba.de){.email}
## Project status
If you have run out of energy or time for your project, put a note at the top of the README saying that development has slowed down or stopped completely. Someone may choose to fork your project or volunteer to step in as a maintainer or owner, allowing your project to keep going. You can also make an explicit request for maintainers.
**Imke Voß** [imke.voss\@uba.de](mailto:imke.voss@uba.de){.email}
File added
target: 'NO2'
lightgbm:
nrounds: 200
eta: 0.03
num_leaves: 32
dynamic_regression:
ntrain: 8760 # 24*365 = 1 year of training data
random_forest:
num.trees: 300
mtry: NULL
min.node.size: NULL
max.depth: 10
fnn:
activationfun: tanh
output: linear
learningrate: 0.05
learningrate_scale: 1
batchsize: 32
momentum: 0.9
visible_dropout: 0.0
hidden_dropout: 0.0
hidden:
- 50
- 50
numepochs: 200
meteo_variables:
- GLO
- TMP
- RFE
- WIG
- WIR
- LDR
inst/sticker/smoke.png

46.4 KiB

inst/sticker/stickers-ubair-1.png

51 KiB

ki-lab-logo.png

7.5 KiB

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/model_evaluation.R
\name{calc_performance_metrics}
\alias{calc_performance_metrics}
\title{Calculates performance metrics of a business-as-usual model}
\usage{
calc_performance_metrics(predictions, date_effect_start = NULL, buffer = 0)
}
\arguments{
\item{predictions}{data.table or data.frame with the following columns
\describe{
\item{date}{Date of the observation. Needs to be comparable to
date_effect_start element.}
\item{value}{True observed value of the station}
\item{prediction}{Predicted model output for the same time and station
as value}
\item{prediction_lower}{Lower end of the prediction interval}
\item{prediction_upper}{Upper end of the prediction interval}
}}
\item{date_effect_start}{A date. Start date of the
effect that is to be evaluated. The data from this point onwards is disregarded
for calculating model performance}
\item{buffer}{Integer. An additional buffer window before date_effect_start to account
for uncertainty in the effect start point. Disregards additional buffer data
points for model evaluation}
}
\value{
Named vector with performance metrics of the model
}
\description{
Model agnostic function to calculate a number of common performance
metrics on the reference time window.
Uses the true data \code{value} and the predictions \code{prediction} for this calculation.
The coverage is calculated from the columns \code{value}, \code{prediction_lower} and
\code{prediction_upper}.
Removes dates in the effect and buffer range as the model is not expected to
be performing correctly for these times. The incorrectness is precisely
what we are using for estimating the effect.
}
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/model_evaluation.R
\name{calc_summary_statistics}
\alias{calc_summary_statistics}
\title{Calculates summary statistics for predictions and true values}
\usage{
calc_summary_statistics(predictions, date_effect_start = NULL, buffer = 0)
}
\arguments{
\item{predictions}{Data.table or data.frame with the following columns
\describe{
\item{date}{Date of the observation. Needs to be comparable to
date_effect_start element.}
\item{value}{True observed value of the station}
\item{prediction}{Predicted model output for the same time and station
as value}
}}
\item{date_effect_start}{A date. Start date of the
effect that is to be evaluated. The data from this point onwards is disregarded
for calculating model performance}
\item{buffer}{Integer. An additional buffer window before date_effect_start to account
for uncertainty in the effect start point. Disregards additional buffer data
points for model evaluation}
}
\value{
data.frame of summary statistics with columns true and prediction
}
\description{
Helps with analyzing predictions by comparing them with the true values on
a number of relevant summary statistics.
}
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data_cleaning.R
\name{clean_data}
\alias{clean_data}
\title{Clean and Optionally Aggregate Environmental Data}
\usage{
clean_data(env_data, station, aggregate_daily = FALSE)
}
\arguments{
\item{env_data}{A data table in long format.
Must include columns:
\describe{
\item{Station}{Station identifier for the data.}
\item{Komponente}{Measured environmental component e.g. temperature, NO2.}
\item{Wert}{Measured value.}
\item{date}{Timestamp as Date-Time object (\verb{YYYY-MM-DD HH:MM:SS} format).}
\item{Komponente_txt}{Textual description of the component.}
}}
\item{station}{Character. Name of the station to filter by.}
\item{aggregate_daily}{Logical. If \code{TRUE}, aggregates data to daily mean values. Default is \code{FALSE}.}
}
\value{
A \code{data.table}:
\itemize{
\item If \code{aggregate_daily = TRUE}: Contains columns for station, component, day, year,
and the daily mean value of the measurements.
\item If \code{aggregate_daily = FALSE}: Contains cleaned data with duplicates removed.
}
}
\description{
Cleans a data table of environmental measurements by filtering for a specific
station, removing duplicates, and optionally aggregating the data on a daily
basis using the mean.
}
\details{
Duplicate rows (by \code{date}, \code{Komponente}, and \code{Station}) are removed. A warning is issued
if duplicates are found.
}
\examples{
# Example data
env_data <- data.table::data.table(
Station = c("DENW094", "DENW094", "DENW006", "DENW094"),
Komponente = c("NO2", "O3", "NO2", "NO2"),
Wert = c(45, 30, 50, 40),
date = as.POSIXct(c(
"2023-01-01 08:00:00", "2023-01-01 09:00:00",
"2023-01-01 08:00:00", "2023-01-02 08:00:00"
)),
Komponente_txt = c(
"Nitrogen Dioxide", "Ozone", "Nitrogen Dioxide", "Nitrogen Dioxide"
)
)
# Clean data for StationA without aggregation
cleaned_data <- clean_data(env_data, station = "DENW094", aggregate_daily = FALSE)
print(cleaned_data)
}
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data_loading.R
\name{copy_default_params}
\alias{copy_default_params}
\title{Copy Default Parameters File}
\usage{
copy_default_params(dest_dir)
}
\arguments{
\item{dest_dir}{Character. The path to the directory where the \code{params.yaml}
file will be copied.}
}
\value{
Nothing is returned. A message is displayed upon successful copying.
}
\description{
Copies the default \code{params.yaml} file, included with the package, to a
specified destination directory. This is useful for initializing parameter
files for custom edits.
}
\details{
The \code{params.yaml} file contains default model parameters for various
configurations such as LightGBM, dynamic regression, and others. See the
\code{\link[ubair:load_params]{load_params()}}` documentation for an example of the file's structure.
}
\examples{
\dontrun{
copy_default_params("path/to/destination")
}
}
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data_preprocessing.R
\name{detrend}
\alias{detrend}
\title{Removes trend from data}
\usage{
detrend(split_data, mode = "linear", num_splines = 5, log_transform = FALSE)
}
\arguments{
\item{split_data}{List of two named dataframes called train and apply}
\item{mode}{String which defines type of trend is present. Options are
"linear", "quadratic", "exponential", "spline", "none".
"none" returns original data}
\item{num_splines}{Defines the number of cubic splines if \code{mode="spline"}.
Choose num_splines=1 for cubic polynomial trend. If \code{mode!="spline"}, this
parameter is ignored}
\item{log_transform}{If \code{TRUE}, use a log-transformation before detrending
to ensure positivity of all predictions in the rest of the pipeline.
A exp transformation is necessary during retrending to return to the solution
space. Use only in combination with \code{log_transform} parameter in
\code{\link[=retrend_predictions]{retrend_predictions()}}}
}
\value{
List of 3 elements. 2 dataframes: detrended train, apply and the
trend function
}
\description{
Takes a list of train and application data as prepared by
\code{\link[=split_data_counterfactual]{split_data_counterfactual()}}
and removes a polynomial, exponential or cubic spline spline trend function.
Trend is obtained only from train data. Use as part of preprocessing before
training a model based on decision trees, i.e. random forest and lightgbm.
For the other methods it may be helpful but they are generally able to
deal with trends themselves. Therefore we recommend to try out different
versions and guide decisisions using the model evaluation metrics from
\code{\link[=calc_performance_metrics]{calc_performance_metrics()}}.
}
\details{
Apply \code{\link[=retrend_predictions]{retrend_predictions()}} to predictions to return to the
original data units.
}
\examples{
\dontrun{
split_data <- split_data_counterfactual(
dt_prepared, training_start,
training_end, application_start, application_end
)
detrended_list <- detrend(split_data, mode = "linear")
detrended_train <- detrended_list$train
detrended_apply <- detrended_list$apply
trend <- detrended_list$model
}
}
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/model_evaluation.R
\name{estimate_effect_size}
\alias{estimate_effect_size}
\title{Estimates size of the external effect}
\usage{
estimate_effect_size(df, date_effect_start, buffer = 0, verbose = FALSE)
}
\arguments{
\item{df}{Data.table or data.frame with the following columns
\describe{
\item{date}{Date of the observation. Needs to be comparable to
date_effect_start element.}
\item{value}{True observed value of the station}
\item{prediction}{Predicted model output for the same time and station
as value}
}}
\item{date_effect_start}{A date. Start date of the
effect that is to be evaluated. The data from this point onward is disregarded
for calculating model performance.}
\item{buffer}{Integer. An additional buffer window before date_effect_start to account
for uncertainty in the effect start point. Disregards additional buffer data
points for model evaluation}
\item{verbose}{Prints an explanation of the results if TRUE}
}
\value{
A list with two numbers: Absolute and relative estimated effect size.
}
\description{
Calculates an estimate for the absolute and relative effect size of the
external effect. The absolute effect is the difference between the model
bias in the reference time and the effect time windows. The relative effect
is the absolute effect divided by the mean true value in the reference
window.
}
\details{
Note: Since the bias is of the model is an average over predictions and true
values, it is important, that the effect window is specified correctly.
Imagine a scenario like a fire which strongly affects the outcome for one
hour and is gone the next hour. If we use a two week effect window, the
estimated effect will be 14*24=336 times smaller compared to using a 1-hour
effect window. Generally, we advise against studying very short effects (single
hour or single day). The variability of results will be too large to learn
anything meaningful.
}
man/figures/README-counterfactual-scenario-1.png

48.7 KiB

man/figures/README-feature_importance-1.png

62.7 KiB

man/figures/README-feature_importance-2.png

83.8 KiB

man/figures/README-plot-meteo-data-1.png

295 KiB

man/figures/time_split_overview.png

155 KiB

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data_cleaning.R
\name{get_meteo_available}
\alias{get_meteo_available}
\title{Get Available Meteorological Components}
\usage{
get_meteo_available(env_data)
}
\arguments{
\item{env_data}{Data table containing environmental data.
Must contain column "Komponente"}
}
\value{
A vector of available meteorological components.
}
\description{
Identifies unique meteorological components from the provided environmental data,
filtering only those that match the predefined UBA naming conventions. These components
include "GLO", "LDR", "RFE", "TMP", "WIG", "WIR", "WIND_U", and "WIND_V".
}
\examples{
# Example environmental data
env_data <- data.table::data.table(
Komponente = c("TMP", "NO2", "GLO", "WIR"),
Wert = c(25, 40, 300, 50),
date = as.POSIXct(c(
"2023-01-01 08:00:00", "2023-01-01 09:00:00",
"2023-01-01 10:00:00", "2023-01-01 11:00:00"
))
)
# Get available meteorological components
meteo_components <- get_meteo_available(env_data)
print(meteo_components)
}
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data_loading.R
\name{load_params}
\alias{load_params}
\title{Load Parameters from YAML File}
\usage{
load_params(filepath = NULL)
}
\arguments{
\item{filepath}{Character. Path to the YAML file. If \code{NULL}, the function
will attempt to load the default \code{params.yaml} provided in the package.}
}
\value{
A list containing the parameters loaded from the YAML file.
}
\description{
Reads a YAML file containing model parameters, including station settings,
variables, and configurations for various models. If no file path is
provided, the function defaults to loading \code{params.yaml} from the package's
\code{extdata} directory.
}
\details{
The YAML file should define parameters in a structured format, such as:
\if{html}{\out{<div class="sourceCode yaml">}}\preformatted{target: 'NO2'
lightgbm:
nrounds: 200
eta: 0.03
num_leaves: 32
dynamic_regression:
ntrain: 8760
random_forest:
num.trees: 300
max.depth: 10
meteo_variables:
- GLO
- TMP
}\if{html}{\out{</div>}}
}
\examples{
\dontrun{
params <- load_params("path/to/custom_params.yaml")
}
}
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data_loading.R
\name{load_uba_data_from_dir}
\alias{load_uba_data_from_dir}
\title{Load UBA Data from Directory}
\usage{
load_uba_data_from_dir(data_dir)
}
\arguments{
\item{data_dir}{Character. Path to the directory containing \code{.csv} files.}
}
\value{
A \code{data.table} containing the loaded data in long format. Returns an error if no valid
files are found or the resulting dataset is empty.
}
\description{
This function loads data from CSV files in the specified directory. It supports two formats:
}
\details{
\enumerate{
\item "inv": Files must contain the following columns:
\itemize{
\item \code{Station}, \code{Komponente}, \code{Datum}, \code{Uhrzeit}, \code{Wert}.
}
\item "24Spalten": Files must contain:
\itemize{
\item \code{Station}, \code{Komponente}, \code{Datum}, and columns \code{Wert01}, ..., \code{Wert24}.
}
}
File names should include "inv" or "24Spalten" to indicate their format. The function scans
recursively for \code{.csv} files in subdirectories and combines the data into a single \code{data.table}
in long format.
Files that are not in the exected format will be ignored.
}

Consent

On this website, we use the web analytics service Matomo to analyze and review the use of our website. Through the collected statistics, we can improve our offerings and make them more appealing for you. Here, you can decide whether to allow us to process your data and set corresponding cookies for these purposes, in addition to technically necessary cookies. Further information on data protection—especially regarding "cookies" and "Matomo"—can be found in our privacy policy. You can withdraw your consent at any time.