cache folder needs to be created manually
The file config/text_classification.yaml contains the variable huggingface_cache: groups/ki-lab/huggingface_cache/
This path needs to be changed or the folder needs to be created manually, in case that python has no permission to create this folder:
uv run src/text_classification/main.py
output_dir: output
output_file_prefix: exp
huggingface_cache: /groups/ki-lab/huggingface_cache/
data:
huggingface_dataset: imdb
max_rows: 200
split_seed_id: 1
split_seeds:
- 2042
- 4562
- 8927
- 7402
- 1975
- 8972
- 3498
- 4608
- 9781
- 4310
model_type: traditional
traditional_model:
model_type: linear
model_params:
linear:
max_iter: 1000
feature_type: bow
feature_params:
bow:
analyzer: word
tfidf:
analyzer: char
ngram_range: ${as_tuple:2,3}
max_features: 5000
embedding_model: sentence-transformers/paraphrase-multilingual-mpnet-base-v2
llm:
model_name: Qwen/Qwen2.5-7B-Instruct
base_url: http://localhost:8001/v1
cache: false
max_tokens: 1000
Error executing job with overrides: []
Traceback (most recent call last):
File "/home/hartwigt/Documents/llm-testframework/src/text_classification/main.py", line 47, in app
dset_dict = load_dataset(data_name, cache_dir=cfg["huggingface_cache"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hartwigt/Documents/llm-testframework/.venv/lib/python3.12/site-packages/datasets/load.py", line 2062, in load_dataset
builder_instance = load_dataset_builder(
^^^^^^^^^^^^^^^^^^^^^
File "/home/hartwigt/Documents/llm-testframework/.venv/lib/python3.12/site-packages/datasets/load.py", line 1819, in load_dataset_builder
builder_instance: DatasetBuilder = builder_cls(
^^^^^^^^^^^^
File "/home/hartwigt/Documents/llm-testframework/.venv/lib/python3.12/site-packages/datasets/builder.py", line 386, in __init__
os.makedirs(self._cache_dir_root, exist_ok=True)
File "<frozen os>", line 215, in makedirs
File "<frozen os>", line 215, in makedirs
File "<frozen os>", line 225, in makedirs
PermissionError: [Errno 13] Permission denied: '/groups'
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
This should be improved in the documentation - at least by catching the error and give a specific instruction ti users how to fix it, i.e. where to find the folder.