|
| 1 | +# Models |
| 2 | +Common modelzoo such as huggingface/transformers stuggles when using Pytorch native model parallelism. Following the design principle of vLLM, we keep a simple, parallelizable, highly-optimized with packed inputs in verl. |
| 3 | +## Adding a New Huggingface Model |
| 4 | +### Step 1: Copy the model file from HF to verl |
| 5 | +- Add a new file under verl/models/hf |
| 6 | +- Copy ONLY the model file from huggingface/transformers/models to verl/models/hf |
| 7 | + |
| 8 | +### Step 2: Modify the model file to use packed inputs |
| 9 | +- Remove all the code related to inference (kv cache) |
| 10 | +- Modify the inputs to include only |
| 11 | + - input_ids (total_nnz,) |
| 12 | + - cu_seqlens (total_nnz + 1,) |
| 13 | + - max_seqlen_in_batch: int |
| 14 | +- Note that this requires using flash attention with causal mask. |
| 15 | + |
| 16 | +### Step 2.5: Add tests |
| 17 | +- Add a test to compare this version and the huggingface version |
| 18 | +- Following the infrastructure and add tests to tests/models/hf |
| 19 | + |
| 20 | +### Step 3: Add a function to apply tensor parallelism |
| 21 | +- Please follow |
| 22 | + - https://pytorch.org/docs/stable/distributed.tensor.parallel.html |
| 23 | + - https://pytorch.org/tutorials/intermediate/TP_tutorial.html |
| 24 | +- General comments |
| 25 | + - Tensor Parallelism in native Pytorch is NOT auto-parallelism. The way it works is to specify how model parameters and input/output reshards using configs. These configs are then registered as hooks to perform input/output resharding before/after model forward. |
| 26 | + |
| 27 | +### Step 4: Add a function to apply data parallelism |
| 28 | +- Please use FSDP2 APIs |
| 29 | +- See demo here https://github.com/pytorch/torchtitan/blob/main/torchtitan/parallelisms/parallelize_llama.py#L413 |
| 30 | + |
| 31 | +### Step 5: Add a function to apply pipeline parallelism |
| 32 | +- Comes in Pytorch 2.4 |
| 33 | +- Currently only in alpha in nightly version |
| 34 | +- Check torchtitan for more details |
| 35 | + |
0 commit comments