Skip to content

Commit c1cee50

Browse files
committed
remove non pyfiles
1 parent 51ee384 commit c1cee50

2 files changed

Lines changed: 33 additions & 4 deletions

File tree

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
from pathlib import Path
2+
from typing import Set
3+
4+
from labml import lab
5+
6+
7+
def remove_files(path: Path, keep: Set[str]):
8+
"""
9+
Remove files
10+
"""
11+
12+
for p in path.iterdir():
13+
if p.is_symlink():
14+
p.unlink()
15+
continue
16+
if p.is_dir():
17+
remove_files(p, keep)
18+
else:
19+
if p.suffix not in keep:
20+
p.unlink()
21+
22+
23+
def main():
24+
remove_files(lab.get_data_path() / 'source', {'.py'})
25+
26+
27+
if __name__ == '__main__':
28+
main()

readme.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,13 @@ This repo trains deep learning models on source code.
1111
[PyTorch awesome list](https://github.com/bharathgs/Awesome-pytorch-list).
1212
4. Run `python_autocomplete/extract_downloads.py` to extract the downloaded zip files to `data/source`.
1313
You can directly copy any python code to `data/source` to train on them.
14-
5. Run `create_dataset.py` to collect all python files.
14+
5. Run `python_autocomplete/remove_non_source_files.py` to all files except `.py` files.
15+
6. Run `create_dataset.py` to collect all python files.
1516
The collected code will be written to `data/train.py` and, `data/eval.py`.
16-
6. Run `train.py` to train the model.
17+
7. Run `train.py` to train the model.
1718
*Try changing hyper-parameters like model dimensions and number of layers*.
18-
7. Run `evaluate.py` to evaluate the model.
19-
8. Enjoy!
19+
8. Run `evaluate.py` to evaluate the model.
20+
9. Enjoy!
2021

2122
If you have any questions please open an issue on Github.
2223

0 commit comments

Comments
 (0)