You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+74-3Lines changed: 74 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -48,16 +48,76 @@ We also provide a `Dockerfile` so you can set up and run our code on your own sy
48
48
49
49
[](https://github.com/gregbolet/gpu-flopbench/actions/workflows/buildAllCodesGithubAction.yml)
50
50
51
+
## Docker Setup Instructions
52
+
53
+
For ease-of-reproducibility, we supply a `Dockerfile` with the necessary steps to recreate our environment and dataset using your own GPU hardware.
54
+
The following is a list of steps to help you get set up and into the main bash shell of the container.
55
+
56
+
‼️‼️
57
+
We note that the base container image will take up about 40 GB of storage space; once we start building codes and gathering profiling data, the disk usage will jump up to about 50 GB.
58
+
Please ensure your system has enough storage space before continuing.
NVIDIA Control Panel >> Select a Task... Tree Pane >> Developer (expand section) >> Manage GPU Performance Counters >> Allow access to the GPU performance counters to all users (make sure this is enabled)
74
+
then
75
+
restart Docker Desktop
76
+
```
77
+
78
+
## Docker Data Collection Instructions (CUDA program building & profiling)
79
+
80
+
Once you're in the main bash shell of the container, you should be by default in the `/gpu-flopbench` directory with a conda environment called `gpu-flopbench`.
81
+
We can now start building all the codes and collecting their performance counter data! 🌈😊
82
+
83
+
Run the following commands from the `gpu-flopbench` main project directory within the Docker container (they should work without issue):
84
+
```
85
+
source ./runBuild.sh
86
+
```
87
+
^ Depending on the number of cores on your CPU, this can take anywhere from 5-20 minutes.
88
+
It's essentially building all the codes using our `CMakeLists.txt` file.
89
+
Once this is done, we can start gathering CUDA kernel profiling data with the following command:
90
+
```
91
+
LD_LIBRARY_PATH=/usr/lib/llvm-18/lib:$LD_LIBRARY_PATH DATAPATH=$PWD/src/prna-cuda/data_tables python3 ./gatherData.py --outfile=roofline-data.csv 2>&1 | tee runlog.txt
92
+
```
93
+
^ This process will take about 5-6 hours, so please have someone around to babysit in case any unexpected issues arise.
94
+
We tested this on our own Docker container and had no issues.
95
+
96
+
# Solo (no Docker) Instructions
97
+
98
+
Below is a list of instructions for reproducing what is done in the above Docker container, but instead on your own system.
99
+
This path is laden with more unexpected complications and potentially more debugging effort, so continue at your own risk.
100
+
A lot of the CUDA codes we built had their compilation instructions tailored to our particular system, so you may end up having to do more work to get all the codes built and running if you decide to change compiler, compiler versions, or CUDA versions.
101
+
In future work we would like to make this process of building the codes agnostic to the system, but for now this is what we have working.
Execute the following command to get the Makefile generated and to start the build process.
54
112
This will automatically `make` all the programs, **you'll NEED to edit the `runBuild.sh` script to properly set any compilers/options for the codes to build**.
55
-
NOTE: If you're running this from a Docker container generated from our Dockerfile, it should work out-of-the-box.
56
113
By default, we have everything building with `clang++` and `clang`, this should mostly work out-of-the-box but some include paths may need to be set/overriden. (SEE BELOW)
114
+
57
115
```
58
116
source ./runBuild.sh
59
117
```
60
-
We originally had the CUDA codes building with `nvcc` but for simplicity have switch to just LLVM. You may still be able to build the codes with `nvcc`, but it may take some modifications to the build pipeline.
118
+
NOTE: If you're running this from a Docker container generated from our Dockerfile, it should work out-of-the-box.
119
+
We originally had the CUDA codes building with `nvcc`, but to be able to also build SYCL and OMP codes, we switched to just LLVM. You may still be able to build the codes with `nvcc`, but it may take some modifications to the build pipeline.
120
+
We have future plans to sample SYCL and OMP codes, but for now, this work focuses on CUDA codes.
61
121
62
122
63
123
## Common Build Issues
@@ -75,18 +135,26 @@ Here's a list of other common build issues that might help if you're encounterin
75
135
- missing libs to link
76
136
- putting some search/include dirs before others when compiling (duplicate filenames can cause header include mixups)
77
137
138
+
We note that our entire build process is captured in one `CMakeLists.txt` file.
139
+
This was done purposely to be able to build all the codes in a batch manner, as having to manually go in and modify individual HeCBench Makefiles was tiresome.
140
+
141
+
Essentially, our `CMakeLists.txt` file treats each `src/*-cuda` and `src/*-omp` directory as a single CMake/Makefile target, with the corresponding output executable having the same name as its `src` directory.
142
+
We automatically include many of the sub-directories for header files.
143
+
The reason why our `CMakeLists.txt` file is so long is because there were many codes that we had to manually modify their build process to get them to build correctly.
144
+
This took a while to do, but ultimately makes the build process much easier and manageable.
145
+
78
146
## Python Environment Setup
79
147
80
148
We used Python3 (v3.11.11) for executing our Python scripts.
81
149
The `requirements.txt` file contains all the necessary packages and their versions that should be installed prior to using any of our Python scripts.
82
150
It is strongly advised to set up a new Conda environment to not mess up the base Python installation on your system.
83
-
NOTE: This is already done for you if you're using the supplied Dockerfile.
NOTE: This is already done for you if you're using the supplied Dockerfile.
90
158
91
159
## Gathering Roofline Data
92
160
@@ -95,6 +163,7 @@ Once all the codes are built, we can start the data collection process. We have
95
163
```
96
164
LD_LIBRARY_PATH=/usr/lib/llvm-18/lib:$LD_LIBRARY_PATH DATAPATH=$PWD/src/prna-cuda/data_tables python3 ./gatherData.py --outfile=roofline-data.csv 2>&1 | tee runlog.txt
97
165
```
166
+
NOTE: This command should work out-of-the-box if you built a container using our Dockerfile.
98
167
99
168
This will automatically invoke each of the built executables, using `ncu` (NVIDIA Nsight Compute) to profile each of the kernels in the executable. Some of the codes require files to be downloaded proir, this script takes care of the downloading process and makes sure that all the requested files are in place.
100
169
The `DATAPATH` environment variable is only needed by `frna-cuda` and `prna-cuda`, so if you're not running those, you can drop it.
@@ -114,6 +183,8 @@ The internal workflow at a high level looks like the following:
114
183
115
184
The `gatherData.py` script will emit a CSV file called `roofline-data.csv` containing all the benchmarking data. After each kernel is run, the data is written out to the last line of the CSV file. We encourage writing the results of the execution to a log file for later error/execution analysis.
116
185
186
+
‼️‼️This process of profiling all the codes can take a while (roughly 6-7 hours), we suggest leaving the profiling running while someone babysits in case of an unexpected error. ‼️‼️
0 commit comments