Skip to content

Commit b62e7a6

Browse files
committed
feat(docs): humaneval+
1 parent 301f936 commit b62e7a6

2 files changed

Lines changed: 17 additions & 1 deletion

File tree

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ Below are the features and tasks of this framework:
2626
- We provide Multi-GPU text generation with `accelerate` and Dockerfiles for evaluating on Docker containers for security and reproducibility.
2727

2828
- Tasks:
29-
- 5 code generation **Python** tasks (with unit tests): [HumanEval](https://huggingface.co/datasets/openai_humaneval), [InstructHumanEval](https://huggingface.co/datasets/codeparrot/instructhumaneval), [APPS](https://huggingface.co/datasets/codeparrot/apps), [MBPP](https://huggingface.co/datasets/mbpp) and [DS-1000](https://github.com/HKUNLP/DS-1000/) for both completion (left-to-right) and insertion (FIM) mode.
29+
- 6 code generation **Python** tasks (with unit tests): [HumanEval](https://huggingface.co/datasets/openai_humaneval), [HumanEval+](https://huggingface.co/datasets/evalplus/humanevalplus), [InstructHumanEval](https://huggingface.co/datasets/codeparrot/instructhumaneval), [APPS](https://huggingface.co/datasets/codeparrot/apps), [MBPP](https://huggingface.co/datasets/mbpp) and [DS-1000](https://github.com/HKUNLP/DS-1000/) for both completion (left-to-right) and insertion (FIM) mode.
3030
- [HumanEvalPack](https://huggingface.co/datasets/bigcode/humanevalpack) extends HumanEval to **3** scenarios across **6** languages via human translations and was released with [OctoPack](https://arxiv.org/abs/2308.07124).
3131
- [MultiPL-E](https://github.com/nuprl/MultiPL-E) evaluation suite (HumanEval translated into **18** programming languages).
3232
- [Recode](https://github.com/amazon-science/recode/tree/main) applied to the HumanEval benchmark. It evaluates the robustness of code-generation models.

docs/README.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,22 @@ accelerate launch main.py \
4949

5050
If you want to evaluate only on the first $n$ samples instead of all the test dataset, set `limit` argument to $n$.
5151

52+
### HumanEval+
53+
[HumanEval+](https://huggingface.co/datasets/evalplus/humanevalplus): HumanEval with additional unit tests (80x of the original HumanEval) for each of the 164 problems.
54+
55+
The generation and evaluation follows the same approach as [HumanEval](#humaneval). One only needs to change the task name to `humanevalplus` to run the evaluation on HumanEval+, such as:
56+
57+
```python
58+
accelerate launch main.py \
59+
--model <MODEL_NAME> \
60+
--max_length_generation <MAX_LENGTH> \
61+
--tasks humanevalplus \
62+
--temperature 0.2 \
63+
--n_samples 200 \
64+
--batch_size 10 \
65+
--allow_code_execution
66+
```
67+
5268

5369
### HumanEvalPack
5470

0 commit comments

Comments
 (0)