DataArcTech
diff --git a/‎.gitignore‎
Lines changed: 2 additions & 1 deletion b/‎.gitignore‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎verl/models/README.md‎
Lines changed: 35 additions & 0 deletions b/‎verl/models/README.md‎
Lines changed: 35 additions & 0 deletions
diff --git a/‎verl/models/__init__.py‎
Lines changed: 13 additions & 0 deletions b/‎verl/models/__init__.py‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎verl/models/__pycache__/__init__.cpython-39.pyc‎
148 Bytes b/‎verl/models/__pycache__/__init__.cpython-39.pyc‎
148 Bytes
diff --git a/‎verl/models/__pycache__/registry.cpython-39.pyc‎
1.89 KB b/‎verl/models/__pycache__/registry.cpython-39.pyc‎
1.89 KB
diff --git a/‎verl/models/llama/__init__.py‎
Lines changed: 13 additions & 0 deletions b/‎verl/models/llama/__init__.py‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎verl/models/llama/megatron/__init__.py‎
Lines changed: 24 additions & 0 deletions b/‎verl/models/llama/megatron/__init__.py‎
Lines changed: 24 additions & 0 deletions
diff --git a/‎verl/models/llama/megatron/checkpoint_utils/__init__.py‎
Lines changed: 13 additions & 0 deletions b/‎verl/models/llama/megatron/checkpoint_utils/__init__.py‎
Lines changed: 13 additions & 0 deletions
@@ -5,7 +5,8 @@ logs/
 outputs/
 results/
 wandb/
-sh/
+# sh/
+*verl/models/
 openr1_ckpts/
 *.wandb
 *.out
 
@@ -0,0 +1,35 @@
+# Models
+Common modelzoo such as huggingface/transformers stuggles when using Pytorch native model parallelism. Following the design principle of vLLM, we keep a simple, parallelizable, highly-optimized with packed inputs in verl. 
+## Adding a New Huggingface Model
+### Step 1: Copy the model file from HF to verl
+- Add a new file under verl/models/hf
+- Copy ONLY the model file from huggingface/transformers/models to verl/models/hf
+
+### Step 2: Modify the model file to use packed inputs
+- Remove all the code related to inference (kv cache)
+- Modify the inputs to include only
+    - input_ids (total_nnz,)
+    - cu_seqlens (total_nnz + 1,)
+    - max_seqlen_in_batch: int
+- Note that this requires using flash attention with causal mask.
+
+### Step 2.5: Add tests
+- Add a test to compare this version and the huggingface version
+- Following the infrastructure and add tests to tests/models/hf
+
+### Step 3: Add a function to apply tensor parallelism
+- Please follow
+    - https://pytorch.org/docs/stable/distributed.tensor.parallel.html
+    - https://pytorch.org/tutorials/intermediate/TP_tutorial.html
+- General comments
+    - Tensor Parallelism in native Pytorch is NOT auto-parallelism. The way it works is to specify how model parameters and input/output reshards using configs. These configs are then registered as hooks to perform input/output resharding before/after model forward.
+
+### Step 4: Add a function to apply data parallelism
+- Please use FSDP2 APIs
+- See demo here https://github.com/pytorch/torchtitan/blob/main/torchtitan/parallelisms/parallelize_llama.py#L413
+
+### Step 5: Add a function to apply pipeline parallelism
+- Comes in Pytorch 2.4
+- Currently only in alpha in nightly version
+- Check torchtitan for more details
+
@@ -0,0 +1,13 @@
+# Copyright 2024 Bytedance Ltd. and/or its affiliates
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
@@ -0,0 +1,13 @@
+# Copyright 2024 Bytedance Ltd. and/or its affiliates
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
@@ -0,0 +1,24 @@
+# Copyright 2024 Bytedance Ltd. and/or its affiliates
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .modeling_llama_megatron import (
+    # original model with megatron
+    ParallelLlamaModel,
+    ParallelLlamaForCausalLM,
+    # rmpad with megatron
+    ParallelLlamaForCausalLMRmPad,
+    ParallelLlamaForValueRmPad,
+    # rmpad with megatron and pipeline parallelism
+    ParallelLlamaForCausalLMRmPadPP,
+    ParallelLlamaForValueRmPadPP)
@@ -0,0 +1,13 @@
+# Copyright 2024 Bytedance Ltd. and/or its affiliates
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.