Skip to content

Commit 8bd74c0

Browse files
committed
Documentation updates for Frontier/Crusher installation and usage, including use of cxi transport
Documented new single-launch script for services and driver components
1 parent 5fcc994 commit 8bd74c0

2 files changed

Lines changed: 162 additions & 41 deletions

File tree

sphinx/source/install_usage/install.rst

Lines changed: 96 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -43,10 +43,14 @@ Once installed, the unit and integration tests can be run as:
4343
A note on libfabric providers
4444
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
4545

46+
We recommend using the system-installed version of libfabric whereever possible. However, if a spack-based manual installation is required, please read this section.
47+
4648
The Mercury library used for the provenance database requires a libfabric provider that supports the **FI_EP_RDM** endpoint. By default spack installs libfabric with the **sockets**, **tcp** and **udp** providers, of which only **sockets** supports this endpoint. However **sockets** is being deprecated as its performance is not as good as other dedicated providers. We recommend installing the **rxm** utility provider alongside **tcp** for most purposes, by appending the spack spec with :code:`^libfabric fabrics=sockets,tcp,rxm`.
4749

4850
For network hardware supporting the Linux Verbs API (such as Infiniband) the **verbs** provider (with **rxm**) may provide better performance. This can be added to the spec as, for example, :code:`^libfabric fabrics=sockets,tcp,rxm,verbs`.
4951

52+
For Slingshot networks (e.g. on Frontier/Crusher), the **cxi** provider may provide better performance. However, manual installation of libfabric with cxi does not appear to be possible due to it being closed-source. We therefore recommend using the system installation on this machine.
53+
5054
Details of how to choose the libfabrics provider used by Mercury can be found :ref:`here <online_analysis>`. For further information consider the `Mercury documentation <https://mercury-hpc.github.io/documentation/#network-abstraction-layer>`_ .
5155

5256
Integrating with system-installed MPI
@@ -79,103 +83,159 @@ Chimbuko can be built without MPI by disabling the **mpi** Spack variant as foll
7983
8084
When used in this mode the user is responsible for manually assigning a "rank" index to each instance of the online AD module, and also for ensuring that an instance of this module is created alongside each instance or rank of the target application (e.g. using a wrapper script that is launched via mpirun). We discuss how this can be achieved :ref:`here <non_mpi_run>`.
8185

82-
Summit
83-
~~~~~~
86+
Frontier/Crusher
87+
~~~~~~~~~~~~~~~~
8488

85-
While the above instructions are sufficient for building Chimbuko on Summit, it is advantageous to take advantage of the pre-existing modules for many of the dependencies. For convenience we provide a Spack **environment** which can be used to install in a self-contained environment Chimbuko using various system libraries. To install, first download the Chimbuko and Mochi repositories:
89+
In the PerformanceAnalysis source we also provide a Spack environment yaml for use on Frontier/Crusher, :code:`spack/environments/frontier.yaml` (the same installation and environment can be used for both machines). This environment is designed for the AMD programming environment with Rocm 5.2.0. Installation instructions follow:
90+
91+
First download the Chimbuko and Mochi repositories:
8692

8793
.. code:: bash
8894
8995
git clone https://github.com/mochi-hpc/mochi-spack-packages.git
9096
git clone https://github.com/CODARcode/PerformanceAnalysis.git
9197
92-
Copy the file :code:`spack/environments/summit.yaml` from the PerformanceAnalysis git repository to a convenient location and edit the paths in the :code:`repos` section to point to the paths at which you downloaded the repositories:
98+
Copy the file :code:`spack/environments/frontier.yaml` from the PerformanceAnalysis git repository to a convenient location and edit the paths in the :code:`repos` section to point to the paths at which you downloaded the repositories, e.g.:
9399

94100
.. code:: yaml
95101
96102
repos:
97103
- /autofs/nccs-svm1_home1/ckelly/install/mochi-spack-packages
98104
- /autofs/nccs-svm1_home1/ckelly/src/AD/PerformanceAnalysis/spack/repo/chimbuko
99-
100-
This environment uses the :code:`gcc/9.1.0` and :code:`cuda/11.1.0` modules, which must be loaded prior to installation and running:
105+
106+
This environment uses the following modules, which must be loaded prior to installation and running:
101107

102108
.. code:: bash
103109
104-
module load gcc/9.1.0 cuda/11.2.0
110+
module reset
111+
module load PrgEnv-amd/8.3.3
112+
module swap amd amd/5.2.0
113+
module load cray-python/3.9.13.1
114+
module load cray-mpich/8.1.25
115+
module load gmp/6.2.1
116+
module load craype-accel-amd-gfx90a
117+
module unload darshan-runtime
118+
export LD_LIBRARY_PATH=/opt/gcc/mpfr/3.1.4/lib:$LD_LIBRARY_PATH
105119
106-
Then simply create a new environment and install:
120+
# For some reason not set by the cray-mpich module?
121+
export PATH=${CRAY_MPICH_PREFIX}/bin:${PATH}
122+
export PATH=${ROCM_COMPILER_PATH}/bin:${PATH}
123+
124+
To install the environment:
107125

108126
.. code:: bash
109127
110-
spack env create my_chimbuko_env summit.yaml
128+
spack env create my_chimbuko_env frontier.yaml
111129
spack env activate my_chimbuko_env
112130
spack install
113131
114-
Once installed, simply
132+
To load the environment:
115133

116134
.. code:: bash
117135
136+
# For some reason not set by the cray-mpich module?
137+
export PATH=${CRAY_MPICH_PREFIX}/bin:${PATH}
138+
export PATH=${ROCM_COMPILER_PATH}/bin:${PATH}
139+
140+
export LD_LIBRARY_PATH=/opt/gcc/mpfr/3.1.4/lib:$LD_LIBRARY_PATH
141+
142+
#Looks like spack doesn't pick up cray-xpmem pkg-config loc, put at end so only use as last resort
143+
export PKG_CONFIG_PATH=${PKG_CONFIG_PATH}:/usr/lib64/pkgconfig
144+
118145
spack env activate my_chimbuko_env
119146
spack load tau chimbuko-performance-analysis chimbuko-visualization2
120147
121-
after loading the modules above.
148+
GPU support for TAU C++ compilers
149+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
122150

151+
While the above installation includes TAU and its support for the Rocm runtime API for GPU tracing, the TAU compiler wrappers it builds do not call the Rocm compiler **hipcc** and are therefore unable to instrument mixed C++ and HIP codes. As a workaround, we recommend manually building TAU against the spack-built dependencies as follows
123152

124-
Crusher
125-
~~~~~~
153+
Clone the TAU git repository in a new directory:
126154

127-
In the PerformanceAnalysis source we also provide a Spack environment yaml for use on Crusher, :code:`spack/environments/crusher_rocm5.2_PrgEnv-amd.yaml`. This environment is designed for the AMD programming environment with Rocm 5.2.0. Installation instructions follow:
155+
.. code:: bash
128156
129-
First download the Chimbuko and Mochi repositories:
157+
git clone https://github.com/UO-OACISS/tau2.git
158+
159+
160+
Load the spack environment and create a configuration script (e.g. *config.sh*) with the following content:
161+
162+
.. code:: bash
163+
164+
#!/bin/bash
165+
new_inst=$(pwd)/install #or change to preferred install directory
166+
tau_inst=$(spack location -i tau)
167+
spack_conf=$(grep ./configure ${tau_inst}/.spack/spack-build-out.txt | awk '{$1=$2=""; print $0}')
168+
169+
170+
spack_conf=$(echo $spack_conf | sed 's/-c++=clang++/-c++=hipcc/' | sed -E "s|-prefix=[^']+'|-prefix=${new_inst}'|")
171+
spack_conf=$(echo $spack_conf | sed 's/-cc=clang/-cc=amdclang/')
172+
spack_conf=$(echo $spack_conf | sed 's/-fortran=flang/-fortran=amdflang/')
173+
174+
spack_conf=$(echo $spack_conf | sed -E "s|'-useropt=([^']+)'||")
175+
176+
spack_conf=$(echo $spack_conf | sed -E "s|-rocprofiler=([^']+)'||")
177+
spack_conf=$(echo $spack_conf | sed -E "s|-mpiinc=([^']+)'|-mpiinc=\${MPICH_DIR}/include'|")
178+
spack_conf=$(echo $spack_conf | sed -E "s|-mpilib=([^']+)'|-mpilib=\${MPICH_DIR}/lib'|")
179+
180+
spack_conf=$(echo $spack_conf | sed "s/'//g")
181+
spack_conf="${spack_conf} -ompt -useropt=-g#-O2#-DTAU_MPI_DISABLE_COMM_WRAPPERS"
182+
183+
echo $spack_conf | tee conf_cmd.log
184+
185+
eval "$spack_conf 2>&1 | tee conf.log"
186+
make install 2>&1 | tee build.log
187+
188+
189+
Executing this script will build and install TAU in the *install* subdirectory of the working directory. Finally, add the TAU installation path to the Linux environment
190+
191+
.. code:: bash
192+
193+
export PATH=$(pwd)/install/craycnl/bin:${PATH}
194+
export LD_LIBRARY_PATH=$(pwd)/install/craycnl/lib:${LD_LIBRARY_PATH}
195+
export TAU_MAKEFILE=$(pwd)/install/craycnl/lib/Makefile.tau-rocm-roctracer-amd-clang-papi-ompt-mpi-pthread-pdt-openmp-adios2
196+
197+
The **tau_cxx.sh** wrapper script will now wrap the *hipcc* compiler.
198+
199+
Summit
200+
~~~~~~
201+
202+
While the above instructions are sufficient for building Chimbuko on Summit, it is advantageous to take advantage of the pre-existing modules for many of the dependencies. For convenience we provide a Spack **environment** which can be used to install in a self-contained environment Chimbuko using various system libraries. To install, first download the Chimbuko and Mochi repositories:
130203

131204
.. code:: bash
132205
133206
git clone https://github.com/mochi-hpc/mochi-spack-packages.git
134207
git clone https://github.com/CODARcode/PerformanceAnalysis.git
135208
136-
Copy the file :code:`spack/environments/crusher_rocm5.2_PrgEnv-amd.yaml` from the PerformanceAnalysis git repository to a convenient location and edit the paths in the :code:`repos` section to point to the paths at which you downloaded the repositories:
209+
Copy the file :code:`spack/environments/summit.yaml` from the PerformanceAnalysis git repository to a convenient location and edit the paths in the :code:`repos` section to point to the paths at which you downloaded the repositories:
137210

138211
.. code:: yaml
139212
140213
repos:
141214
- /autofs/nccs-svm1_home1/ckelly/install/mochi-spack-packages
142215
- /autofs/nccs-svm1_home1/ckelly/src/AD/PerformanceAnalysis/spack/repo/chimbuko
143-
144-
This environment uses the following modules, which must be loaded prior to installation and running:
145216
146-
.. code:: bash
217+
This environment uses the :code:`gcc/9.1.0` and :code:`cuda/11.1.0` modules, which must be loaded prior to installation and running:
147218

148-
module reset
149-
module load PrgEnv-amd/8.3.3
150-
module swap amd amd/5.2.0
151-
module load cray-python/3.9.12.1
152-
module load cray-mpich/8.1.17
153-
module load gmp
154-
module load craype-accel-amd-gfx90a
155-
export LD_LIBRARY_PATH=/opt/gcc/mpfr/3.1.4/lib:$LD_LIBRARY_PATH
219+
.. code:: bash
156220
157-
# For some reason not set by the cray-mpich module?
158-
export PATH=${CRAY_MPICH_PREFIX}/bin:${PATH}
159-
export PATH=${ROCM_COMPILER_PATH}/bin:${PATH}
221+
module load gcc/9.1.0 cuda/11.2.0
160222
161-
To install the environment:
223+
Then simply create a new environment and install:
162224

163225
.. code:: bash
164226
165-
spack env create my_chimbuko_env spock.yaml
227+
spack env create my_chimbuko_env summit.yaml
166228
spack env activate my_chimbuko_env
167229
spack install
168230
169-
To load the environment:
231+
Once installed, simply
170232

171233
.. code:: bash
172234
173-
#Looks like spack doesn't pick up cray-xpmem pkg-config loc, put at end so only use as last resort
174-
export PKG_CONFIG_PATH=${PKG_CONFIG_PATH}:/usr/lib64/pkgconfig
175-
176235
spack env activate my_chimbuko_env
177236
spack load tau chimbuko-performance-analysis chimbuko-visualization2
178237
238+
after loading the modules above.
179239

180240

181241
.. _ADIOS2: https://github.com/ornladios/ADIOS2

sphinx/source/install_usage/run_chimbuko.rst

Lines changed: 66 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -175,16 +175,16 @@ which can be used as follows:
175175
Running on Slurm-based systems
176176
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
177177

178-
This section we provide specifics on launching on the Spock machine, but the procedure will also apply to other machines using the Slurm task scheduler.
178+
This section we provide specifics on launching on machines using the Slurm task scheduler.
179179

180-
Spock uses the *slurm* job management system. To control the explicit placement of the ranks we will use the :code:`--nodelist` (:code:`-w`) slurm option to specify the nodes associated with a resource set, the :code:`--nodes` (:code:`-N`) option to specify the number of nodes and the :code:`--overlap` option to allow the AD and application resource sets to coexist on the same node. These options are documented `here <https://slurm.schedmd.com/srun.html>`_.
180+
To control the explicit placement of the ranks we will use the :code:`--nodelist` (:code:`-w`) slurm option to specify the nodes associated with a resource set, the :code:`--nodes` (:code:`-N`) option to specify the number of nodes and the :code:`--overlap` option to allow the AD and application resource sets to coexist on the same node. These options are documented `here <https://slurm.schedmd.com/srun.html>`_.
181181

182-
The :code:`--nodelist` option requires the range of full hostnames of the nodes to be provided. In order to simplify the generation of this list we provide a script `here <https://github.com/CODARcode/PerformanceAnalysis/blob/ckelly_develop/scripts/spock/get_nodes.pl>`_ that parses the **SLURM_JOB_NODELIST** environment variable and generates the nodelist for the services and application. To use:
182+
The :code:`--nodelist` option requires the range of full hostnames of the nodes to be provided. For Crusher/Frontier and Spock we provide perl scripts in the appropriately named subdirectories of `here <https://github.com/CODARcode/PerformanceAnalysis/blob/ckelly_develop/scripts>`_ . These scripts parse the **SLURM_JOB_NODELIST** environment variable and generates the nodelist for the services and application. They differ only in adhering to the node naming convention for that particular machine. To use:
183183

184184
.. code:: bash
185185
186-
service_node=$(./get_nodes.pl HEAD)
187-
body_nodelist=$(./get_nodes.pl BODY)
186+
service_node=$(path_to_script/get_nodes.pl HEAD)
187+
body_nodelist=$(path_to_script/get_nodes.pl BODY)
188188
189189
We can now set the various :code:`<LAUNCH ..>` commands in the section above:
190190

@@ -203,6 +203,30 @@ Where
203203

204204
Note that we have assigned 1 core to each rank of the AD, and so :code:`${n_mpi_ranks_per_node} * (${n_cores_per_rank_main} + 1)` should not exceed 64, the number of available cores.
205205

206+
Running with the CXI network provider on Frontier/Crusher
207+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
208+
209+
Frontier/Crusher and other machines with Cray HPE Slingshot network support an optimized communications provider, **cxi**. Using this requires a few extra steps when running Chimbuko in order to allow the Mochi (provenance database) components to communicate between processes launched under different calls to *srun* (i.e. between our services and clients).
210+
211+
First, add the following slurm options to your batch script header section:
212+
213+
.. code:: bash
214+
215+
#SBATCH --network=single_node_vni,job_vni
216+
217+
Then, in the *chimbuko_config.sh*, set the following options (in addition to any other optional arguments):
218+
219+
.. code:: bash
220+
221+
provdb_engine="cxi"
222+
provdb_extra_args="-db_mercury_auth_key 0:0"
223+
commit_extra_args="-provdb_mercury_auth_key 0:0" #add this variable if it doesn't yet exist in the setup script
224+
pserver_extra_args="-provdb_mercury_auth_key 0:0"
225+
ad_extra_args="-provdb_mercury_auth_key 0:0"
226+
227+
Alternatively, if Chimbuko's services and online AD components are launched together using the new, experimental launch procedure (see below), it is only necessary to set the *provdb_engine* option.
228+
229+
206230
Scaling to large job sizes
207231
^^^^^^^^^^^^^^^^^^^^^^^^^^
208232

@@ -284,6 +308,43 @@ Online analysis of a non-MPI application with a non-MPI installation of Chimbuko
284308

285309
In the context of a non-MPI application, instances of the application must still be associated with an index within Chimbuko that allows for their discrimination. This proceeds much as in the previous section, but with a catch: by default Chimbuko assumes that the instance index passed in by the **-rank <rank>** option matches the rank index reflected by the trace data and the ADIOS trace filename produced by Tau. However for a non-MPI application, Tau assigns rank 0 to **all instances**. In order to communicate this to Chimbuko a second command line option must be used: **-override_rank 0**. Here the 0 tells Chimbuko that the input data is labeled as 0 in both the filename and the trace data. Chimbuko will then overwrite the rank index in the trace data to match that of its internal rank index to ensure that this new label is passed through the analysis. Note that the user must make sure that each application instance is assigned either a different **TAU_ADIOS2_PATH** or **TAU_ADIOS2_FILE_PREFIX** otherwise the trace data files will overwrite each other.
286310

311+
Launching Chimbuko's components together through a single script (advanced, experimental)
312+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
313+
314+
In order to simplify the launch procedure we are developing a script that simultaneously instantiates the Chimbuko services and the AD clients. At present we support only the Slurm task manager, and this feature is experimental.
315+
316+
To use it, download the *PerformanceAnalysis* source. Then, in the run script remove the two separate calls to *srun* and the lines associated with the extra gathering of the body and head nodes, and replace with the following:
317+
318+
.. code:: bash
319+
320+
tasks_per_node=<SETME>
321+
nodes=${SLURM_NNODES}
322+
app_nodes=$(( nodes - 1 ))
323+
app_tasks=$(( ${app_nodes} * ${tasks_per_node} ))
324+
stasks=$(( ${nodes} * ${tasks_per_node} ))
325+
326+
srun -N ${nodes} -n ${stasks} --ntasks-per-node ${tasks_per_node} --overlap path_to_PerformanceAnalysis_source/scripts/launch/chimbuko.sh ${app_tasks} &
327+
328+
#Wait until server has started
329+
while [ ! -f chimbuko/vars/chimbuko_ad_cmdline.var ]; do sleep 1; done
330+
331+
where **tasks_per_node** is the number of application tasks that you will be launching. It is assumed that the total number of nodes remains one larger than the number of nodes on which the application is to be launched.
332+
333+
As the services are launched here on the *last* node, a simple call to srun for the application suffices to co-locate the application ranks with the AD instances:
334+
335+
.. code:: bash
336+
337+
srun --overlap -N${app_nodes} --ntasks-per-node=${tasks_per_node} <YOUR APPLICATION> <YOUR ARGUMENTS>
338+
339+
The *chimbuko.sh* script has an optional argument **--core_bind** to bind the AD processes to specific cores, which can be used alongside Slurm's binding options to ensure the AD instances run on separate resources to the application. The format of the argument is a comma-separated list of core indices *per task on any given node*, with those lists themselves separated by colons (:). For example, with **${tasks_per_node}=8**
340+
341+
.. code:: bash
342+
343+
bnd="60,61:62,63:28,29:30,31:44,45:46,47:12,13:14,15"
344+
srun -N ${nodes} -n ${stasks} --ntasks-per-node ${tasks_per_node} --overlap ${rundir}/chimbuko.sh ${app_tasks} --core_bind ${bnd} &
345+
346+
will bind the first AD process on a node to cores 60,61, the second to 62,63 and so on.
347+
287348

288349
.. _benchmark_suite:
289350

0 commit comments

Comments
 (0)