-
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 8 replies
-
|
Docker API 服务 GPU 占用翻倍,常见原因有以下几种: 1. 多实例部署导致模型重复加载 CUDA_VISIBLE_DEVICES=0,1 mineru-vllm-server --host 0.0.0.0 --port 30000 \
--data-parallel-size 2 --gpu-memory-utilization 0.42. vLLM 显存预分配比例过高 command:
--gpu-memory-utilization 0.43. 版本升级带来的显存增长 规避方法汇总:
你可以先检查下是否有多个服务实例在跑,以及当前的 To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
-
|
什么场景需要单个gpu启多个api server呢?每个api server都需要独占一部分显存是预期内的情况。 |
Beta Was this translation helpful? Give feedback.
-
|
一个很神奇的现象,初始化的时候,3.0显存占比是2.6的2倍,但运行一段时间后,降低了,从24%降到了7%!!!没有做任何多余的操作!!! |
Beta Was this translation helpful? Give feedback.


不论何种原因,每个api server的显存都是独立占用的,如果你在单卡上启动了多个server,就要接受显存占用翻倍的情况。