Improve cryptic error message when GPU is out of memory
Suggestion for feature: when I run uv run src/start_vllm.py as suggested in the README, I receive a >100-lines cryptic error message and the code stops. The problem was that I did not have enough GPU memory, i.e., the intended LLM model was too big for my system: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 130.00 MiB. GPU 0 has a total capacity of 3.68 GiB of which 38.69 MiB is free. Including non-PyTorch memory, this process has 3.62 GiB memory in use.
Running out of GPU memory might be a very common problem. Hence, I suggest a check (e.g. try/except) if there is sufficient GPU memory available.