BUG: File-Deleter wirft einen Fehler, wenn `tempfile` index schon existiert
In den Tests bei Innolab-Core ist im RAG-Container ein Fehler aufgefallen.
https://gtl.lab4oev.de/f13/f13-core/innolab/core-innolab/-/merge_requests/2#note_65081
Es scheint, dass ein Fehler geworfen wird, wenn der Index schon existiert. Dieser Fehler ist bisher noch nicht aufgefallen. Aber er ist definitiv unabhängig vom Core und wird daher hier adressiert.
Dieser Fehler tritt immer auf, wenn RAG mit einer ganz neuen Elasticsearch startet.
Todo:
-
Bug kann reproduziert werden -
Bug ist behoben
2025-07-02 12:04:38+0000 - ERROR - rag_registry - file_rag_tempfile_deletion: Unexpected error during temporary file deletion.
rag-1 | Traceback (most recent call last):
rag-1 | File "/rag/src/rag/rag_registry.py", line 269, in file_rag_tempfile_deletion
rag-1 | n_deleted_files = self.file_rag.file_deletion(timestamp=timestamp)
rag-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
rag-1 | File "/rag/src/rag/pipelines/file_rag.py", line 197, in file_deletion
rag-1 | n_deleted_docs = self.file_deletion_pipe.run(
rag-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
rag-1 | File "/opt/conda/lib/python3.11/site-packages/haystack/core/pipeline/async_pipeline.py", line 663, in run
rag-1 | return asyncio.run(
rag-1 | ^^^^^^^^^^^^
rag-1 | File "/opt/conda/lib/python3.11/asyncio/runners.py", line 190, in run
rag-1 | return runner.run(main)
rag-1 | ^^^^^^^^^^^^^^^^
rag-1 | File "/opt/conda/lib/python3.11/asyncio/runners.py", line 118, in run
rag-1 | return self._loop.run_until_complete(task)
rag-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
rag-1 | File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
rag-1 | return future.result()
rag-1 | ^^^^^^^^^^^^^^^
rag-1 | File "/opt/conda/lib/python3.11/site-packages/haystack/core/pipeline/async_pipeline.py", line 548, in run_async
rag-1 | async for partial in self.run_async_generator(
rag-1 | File "/opt/conda/lib/python3.11/site-packages/haystack/core/pipeline/async_pipeline.py", line 425, in run_async_generator
rag-1 | async for partial_res in _wait_for_one_task_to_complete():
rag-1 | File "/opt/conda/lib/python3.11/site-packages/haystack/core/pipeline/async_pipeline.py", line 341, in _wait_for_one_task_to_complete
rag-1 | partial_result = finished.result()
rag-1 | ^^^^^^^^^^^^^^^^^
rag-1 | File "/opt/conda/lib/python3.11/site-packages/haystack/core/pipeline/async_pipeline.py", line 305, in _runner
rag-1 | component_pipeline_outputs = await self._run_component_async(
rag-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
rag-1 | File "/opt/conda/lib/python3.11/site-packages/haystack/core/pipeline/async_pipeline.py", line 75, in _run_component_async
rag-1 | outputs = await loop.run_in_executor(None, lambda: ctx.run(lambda: instance.run(**component_inputs)))
rag-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
rag-1 | File "/opt/conda/lib/python3.11/concurrent/futures/thread.py", line 58, in run
rag-1 | result = self.fn(*self.args, **self.kwargs)
rag-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
rag-1 | File "/opt/conda/lib/python3.11/site-packages/haystack/core/pipeline/async_pipeline.py", line 75, in <lambda>
rag-1 | outputs = await loop.run_in_executor(None, lambda: ctx.run(lambda: instance.run(**component_inputs)))
rag-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
rag-1 | File "/opt/conda/lib/python3.11/site-packages/haystack/core/pipeline/async_pipeline.py", line 75, in <lambda>
rag-1 | outputs = await loop.run_in_executor(None, lambda: ctx.run(lambda: instance.run(**component_inputs)))
rag-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
rag-1 | File "/rag/src/rag/pipelines/components/document_id_retriever.py", line 90, in run
rag-1 | docs = self.document_store.filter_documents(filters=filters)
rag-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
rag-1 | File "/opt/conda/lib/python3.11/site-packages/haystack_integrations/document_stores/elasticsearch/document_store.py", line 294, in filter_documents
rag-1 | self._ensure_initialized()
rag-1 | File "/opt/conda/lib/python3.11/site-packages/haystack_integrations/document_stores/elasticsearch/document_store.py", line 159, in _ensure_initialized
rag-1 | self._client.indices.create(index=self._index, mappings=mappings)
rag-1 | File "/opt/conda/lib/python3.11/site-packages/elasticsearch/_sync/client/utils.py", line 452, in wrapped
rag-1 | return api(*args, **kwargs)
rag-1 | ^^^^^^^^^^^^^^^^^^^^
rag-1 | File "/opt/conda/lib/python3.11/site-packages/elasticsearch/_sync/client/indices.py", line 705, in create
rag-1 | return self.perform_request( # type: ignore[return-value]
rag-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
rag-1 | File "/opt/conda/lib/python3.11/site-packages/elasticsearch/_sync/client/_base.py", line 422, in perform_request
rag-1 | return self._client.perform_request(
rag-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
rag-1 | File "/opt/conda/lib/python3.11/site-packages/elasticsearch/_sync/client/_base.py", line 271, in perform_request
rag-1 | response = self._perform_request(
rag-1 | ^^^^^^^^^^^^^^^^^^^^^^
rag-1 | File "/opt/conda/lib/python3.11/site-packages/elasticsearch/_sync/client/_base.py", line 351, in _perform_request
rag-1 | raise HTTP_EXCEPTIONS.get(meta.status, ApiError)(
rag-1 | elasticsearch.BadRequestError: BadRequestError(400, 'resource_already_exists_exception', 'index [temporaryfile/2ibG0RYOQUynlRLzKhY6Jg] already exists')
rag-1 | 2025-07-02 12:04:38+0000 - INFO - delete_elasticsearch_tempfiles - cleanup_elasticsearch_tempindex_task: Cleaned uploaded documents in elasticsearch temporary index, that were older than one hour. 0 were deleted.
rag-1 | 2025-07-02 12:04:38+0000 - INFO - delete_elasticsearch_tempfiles - cleanup_elasticsearch_tempindex_task: Cleaned uploaded documents in elasticsearch temporary index, that were older than one hour. 0 were deleted.
Edited by Matthias Werner