You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A Kubernetes scheduler designed for smart scheduling with llmaz.
3
+
Scheduler Plugins maintains multiple plugins used to differentiate the scheduling strategies for different workloads.
4
4
5
-
## Plugins
6
-
7
-
vScheduler maintains multiple plugins for llm workloads scheduling.
5
+
## Plugin List
8
6
9
7
### ResourceFungibility Plugin
10
8
11
-
A `llama2-7B` model can be run on __1xA100__ GPU, can also be run on __1xA10__ GPU, this is what we called fungibility.
9
+
A `llama2-7B` model can be running on __1xA100__ GPU, also on __1xA10__ GPU, even on __1x4090__ and a variety of other types of GPUs as well, that's what we called resource fungibility. In practical scenarios, we may have a heterogeneous cluster with different GPU types, and high-end GPUs will stock out a lot, to meet the SLOs of the service as well as the cost, we need to schedule the workloads on different GPU types.
10
+
11
+
With [resourceFungibility](./pkg/plugins/resource_fungibility/README.md) plugin, we can simply achieve this with at most 8 alternative GPU types.
12
12
13
-
With [resourceFungibility](./docs/plugins/resource_fungibility.md) plugin, we can simply achieve this with at most 8 alternative GPU types.
13
+
In the future, we need to explore the GPU usage dynamically, not only for the availability and cost, but also the performance. See related paper about [Mélange: Cost Efficient Large Language Model
14
+
Serving by Exploiting GPU Heterogeneity](https://arxiv.org/pdf/2404.14527).
0 commit comments