Skip to content

Commit f23807f

Browse files
committed
started moving scripts"
1 parent a9f1a09 commit f23807f

5 files changed

Lines changed: 2559 additions & 12 deletions

File tree

README.md

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -60,18 +60,26 @@ Please ensure your system has enough storage space before continuing.
6060

6161
```
6262
git clone git@github.com:gregbolet/gpu-flopbench.git ./gpu-flopbench
63+
6364
cd ./gpu-flopbench
65+
6466
docker build --progress=plain -t 'gpu-flopbench' .
67+
6568
docker run -ti --gpus all --name gpu-flopbench-container --runtime=nvidia -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -e NVIDIA_VISIBLE_DEVICES=all gpu-flopbench
69+
6670
docker exec -it gpu-flopbench-container /bin/bash
6771
```
6872

6973
Note: if you're on a **Windows Docker Desktop** host, be sure to enable the following:
7074
```
7175
NVIDIA Control Panel >> Desktop Tab >> Enable Developer Settings (make sure it's enabled)
76+
7277
then
78+
7379
NVIDIA Control Panel >> Select a Task... Tree Pane >> Developer (expand section) >> Manage GPU Performance Counters >> Allow access to the GPU performance counters to all users (make sure this is enabled)
80+
7481
then
82+
7583
restart Docker Desktop
7684
```
7785

@@ -88,11 +96,20 @@ source ./runBuild.sh
8896
It's essentially building all the codes using our `CMakeLists.txt` file.
8997
Once this is done, we can start gathering CUDA kernel profiling data with the following command:
9098
```
91-
LD_LIBRARY_PATH=/usr/lib/llvm-18/lib:$LD_LIBRARY_PATH DATAPATH=$PWD/src/prna-cuda/data_tables python3 ./gatherData.py --outfile=roofline-data.csv 2>&1 | tee runlog.txt
99+
cd ./cuda-profiling
100+
101+
LD_LIBRARY_PATH=/usr/lib/llvm-18/lib:$LD_LIBRARY_PATH DATAPATH=$PWD/src/prna-cuda/data_tables python3 ./gatherData.py --outfile=profiling-data.csv 2>&1 | tee -a runlog.txt
92102
```
93-
^ This process will take about 5-6 hours, so please have someone around to babysit in case any unexpected issues arise.
103+
^ This process will take about 10 hours, so please have someone around to babysit in case any unexpected issues arise.
94104
We tested this on our own Docker container and had no issues.
95105

106+
### Scraping Source Codes
107+
108+
While you wait for the performance counter data to gather, you can start with a simple scrape of the CUDA codes.
109+
```
110+
python3 simpleScrapeKernels.py --skipSASS --cudaOnly --outfile="simple-scraped-kernels.json"
111+
```
112+
96113
# Solo (no Docker) Instructions
97114

98115
Below is a list of instructions for reproducing what is done in the above Docker container, but instead on your own system.
@@ -184,7 +201,7 @@ The internal workflow at a high level looks like the following:
184201

185202
The `gatherData.py` script will emit a CSV file called `roofline-data.csv` containing all the benchmarking data. After each kernel is run, the data is written out to the last line of the CSV file. We encourage writing the results of the execution to a log file for later error/execution analysis.
186203

187-
‼️‼️This process of profiling all the codes can take a while (roughly 6-7 hours), we suggest leaving the profiling running while someone babysits in case of an unexpected error. ‼️‼️
204+
‼️‼️This process of profiling all the codes can take a while (roughly 10 hours), we suggest leaving the profiling running while someone babysits in case of an unexpected error. ‼️‼️
188205

189206

190207
## Scraping the CUDA kernels

0 commit comments

Comments
 (0)