Skip to content

Latest commit

 

History

History
5 lines (3 loc) · 551 Bytes

File metadata and controls

5 lines (3 loc) · 551 Bytes

The Stack v2 & StarCoder2Data

In this repository you can find the code for building The Stack v2 dataset, as well as the extra sources used to make StarCoder2data: the training corpus of the StarCoder2 family of models.

This reposirory is a follow-up of on the work in bigcode-dataset used for The Stack v1 and StarCoderData.