You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NVIDIA Control Panel >> Select a Task... Tree Pane >> Developer (expand section) >> Manage GPU Performance Counters >> Allow access to the GPU performance counters to all users (make sure this is enabled)
80
+
74
81
then
82
+
75
83
restart Docker Desktop
76
84
```
77
85
@@ -88,11 +96,20 @@ source ./runBuild.sh
88
96
It's essentially building all the codes using our `CMakeLists.txt` file.
89
97
Once this is done, we can start gathering CUDA kernel profiling data with the following command:
90
98
```
91
-
LD_LIBRARY_PATH=/usr/lib/llvm-18/lib:$LD_LIBRARY_PATH DATAPATH=$PWD/src/prna-cuda/data_tables python3 ./gatherData.py --outfile=roofline-data.csv 2>&1 | tee runlog.txt
99
+
cd ./cuda-profiling
100
+
101
+
LD_LIBRARY_PATH=/usr/lib/llvm-18/lib:$LD_LIBRARY_PATH DATAPATH=$PWD/src/prna-cuda/data_tables python3 ./gatherData.py --outfile=profiling-data.csv 2>&1 | tee -a runlog.txt
92
102
```
93
-
^ This process will take about 5-6 hours, so please have someone around to babysit in case any unexpected issues arise.
103
+
^ This process will take about 10 hours, so please have someone around to babysit in case any unexpected issues arise.
94
104
We tested this on our own Docker container and had no issues.
95
105
106
+
### Scraping Source Codes
107
+
108
+
While you wait for the performance counter data to gather, you can start with a simple scrape of the CUDA codes.
Below is a list of instructions for reproducing what is done in the above Docker container, but instead on your own system.
@@ -184,7 +201,7 @@ The internal workflow at a high level looks like the following:
184
201
185
202
The `gatherData.py` script will emit a CSV file called `roofline-data.csv` containing all the benchmarking data. After each kernel is run, the data is written out to the last line of the CSV file. We encourage writing the results of the execution to a log file for later error/execution analysis.
186
203
187
-
‼️‼️This process of profiling all the codes can take a while (roughly 6-7 hours), we suggest leaving the profiling running while someone babysits in case of an unexpected error. ‼️‼️
204
+
‼️‼️This process of profiling all the codes can take a while (roughly 10 hours), we suggest leaving the profiling running while someone babysits in case of an unexpected error. ‼️‼️
0 commit comments