You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: sphinx/source/install_usage/install.rst
+96-36Lines changed: 96 additions & 36 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -43,10 +43,14 @@ Once installed, the unit and integration tests can be run as:
43
43
A note on libfabric providers
44
44
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
45
45
46
+
We recommend using the system-installed version of libfabric whereever possible. However, if a spack-based manual installation is required, please read this section.
47
+
46
48
The Mercury library used for the provenance database requires a libfabric provider that supports the **FI_EP_RDM** endpoint. By default spack installs libfabric with the **sockets**, **tcp** and **udp** providers, of which only **sockets** supports this endpoint. However **sockets** is being deprecated as its performance is not as good as other dedicated providers. We recommend installing the **rxm** utility provider alongside **tcp** for most purposes, by appending the spack spec with :code:`^libfabric fabrics=sockets,tcp,rxm`.
47
49
48
50
For network hardware supporting the Linux Verbs API (such as Infiniband) the **verbs** provider (with **rxm**) may provide better performance. This can be added to the spec as, for example, :code:`^libfabric fabrics=sockets,tcp,rxm,verbs`.
49
51
52
+
For Slingshot networks (e.g. on Frontier/Crusher), the **cxi** provider may provide better performance. However, manual installation of libfabric with cxi does not appear to be possible due to it being closed-source. We therefore recommend using the system installation on this machine.
53
+
50
54
Details of how to choose the libfabrics provider used by Mercury can be found :ref:`here <online_analysis>`. For further information consider the `Mercury documentation <https://mercury-hpc.github.io/documentation/#network-abstraction-layer>`_ .
51
55
52
56
Integrating with system-installed MPI
@@ -79,103 +83,159 @@ Chimbuko can be built without MPI by disabling the **mpi** Spack variant as foll
79
83
80
84
When used in this mode the user is responsible for manually assigning a "rank" index to each instance of the online AD module, and also for ensuring that an instance of this module is created alongside each instance or rank of the target application (e.g. using a wrapper script that is launched via mpirun). We discuss how this can be achieved :ref:`here <non_mpi_run>`.
81
85
82
-
Summit
83
-
~~~~~~
86
+
Frontier/Crusher
87
+
~~~~~~~~~~~~~~~~
84
88
85
-
While the above instructions are sufficient for building Chimbuko on Summit, it is advantageous to take advantage of the pre-existing modules for many of the dependencies. For convenience we provide a Spack **environment** which can be used to install in a self-contained environment Chimbuko using various system libraries. To install, first download the Chimbuko and Mochi repositories:
89
+
In the PerformanceAnalysis source we also provide a Spack environment yaml for use on Frontier/Crusher, :code:`spack/environments/frontier.yaml` (the same installation and environment can be used for both machines). This environment is designed for the AMD programming environment with Rocm 5.2.0. Installation instructions follow:
90
+
91
+
First download the Chimbuko and Mochi repositories:
Copy the file :code:`spack/environments/summit.yaml` from the PerformanceAnalysis git repository to a convenient location and edit the paths in the :code:`repos` section to point to the paths at which you downloaded the repositories:
98
+
Copy the file :code:`spack/environments/frontier.yaml` from the PerformanceAnalysis git repository to a convenient location and edit the paths in the :code:`repos` section to point to the paths at which you downloaded the repositories, e.g.:
spack load tau chimbuko-performance-analysis chimbuko-visualization2
120
147
121
-
after loading the modules above.
148
+
GPU support for TAU C++ compilers
149
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
122
150
151
+
While the above installation includes TAU and its support for the Rocm runtime API for GPU tracing, the TAU compiler wrappers it builds do not call the Rocm compiler **hipcc** and are therefore unable to instrument mixed C++ and HIP codes. As a workaround, we recommend manually building TAU against the spack-built dependencies as follows
123
152
124
-
Crusher
125
-
~~~~~~
153
+
Clone the TAU git repository in a new directory:
126
154
127
-
In the PerformanceAnalysis source we also provide a Spack environment yaml for use on Crusher, :code:`spack/environments/crusher_rocm5.2_PrgEnv-amd.yaml`. This environment is designed for the AMD programming environment with Rocm 5.2.0. Installation instructions follow:
155
+
.. code:: bash
128
156
129
-
First download the Chimbuko and Mochi repositories:
157
+
git clone https://github.com/UO-OACISS/tau2.git
158
+
159
+
160
+
Load the spack environment and create a configuration script (e.g. *config.sh*) with the following content:
161
+
162
+
.. code:: bash
163
+
164
+
#!/bin/bash
165
+
new_inst=$(pwd)/install #or change to preferred install directory
Executing this script will build and install TAU in the *install* subdirectory of the working directory. Finally, add the TAU installation path to the Linux environment
The **tau_cxx.sh** wrapper script will now wrap the *hipcc* compiler.
198
+
199
+
Summit
200
+
~~~~~~
201
+
202
+
While the above instructions are sufficient for building Chimbuko on Summit, it is advantageous to take advantage of the pre-existing modules for many of the dependencies. For convenience we provide a Spack **environment** which can be used to install in a self-contained environment Chimbuko using various system libraries. To install, first download the Chimbuko and Mochi repositories:
Copy the file :code:`spack/environments/crusher_rocm5.2_PrgEnv-amd.yaml` from the PerformanceAnalysis git repository to a convenient location and edit the paths in the :code:`repos` section to point to the paths at which you downloaded the repositories:
209
+
Copy the file :code:`spack/environments/summit.yaml` from the PerformanceAnalysis git repository to a convenient location and edit the paths in the :code:`repos` section to point to the paths at which you downloaded the repositories:
Copy file name to clipboardExpand all lines: sphinx/source/install_usage/run_chimbuko.rst
+66-5Lines changed: 66 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -175,16 +175,16 @@ which can be used as follows:
175
175
Running on Slurm-based systems
176
176
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
177
177
178
-
This section we provide specifics on launching on the Spock machine, but the procedure will also apply to other machines using the Slurm task scheduler.
178
+
This section we provide specifics on launching on machines using the Slurm task scheduler.
179
179
180
-
Spock uses the *slurm* job management system. To control the explicit placement of the ranks we will use the :code:`--nodelist` (:code:`-w`) slurm option to specify the nodes associated with a resource set, the :code:`--nodes` (:code:`-N`) option to specify the number of nodes and the :code:`--overlap` option to allow the AD and application resource sets to coexist on the same node. These options are documented `here <https://slurm.schedmd.com/srun.html>`_.
180
+
To control the explicit placement of the ranks we will use the :code:`--nodelist` (:code:`-w`) slurm option to specify the nodes associated with a resource set, the :code:`--nodes` (:code:`-N`) option to specify the number of nodes and the :code:`--overlap` option to allow the AD and application resource sets to coexist on the same node. These options are documented `here <https://slurm.schedmd.com/srun.html>`_.
181
181
182
-
The :code:`--nodelist` option requires the range of full hostnames of the nodes to be provided. In order to simplify the generation of this list we provide a script `here <https://github.com/CODARcode/PerformanceAnalysis/blob/ckelly_develop/scripts/spock/get_nodes.pl>`_ that parses the **SLURM_JOB_NODELIST** environment variable and generates the nodelist for the services and application. To use:
182
+
The :code:`--nodelist` option requires the range of full hostnames of the nodes to be provided. For Crusher/Frontier and Spock we provide perl scripts in the appropriately named subdirectories of `here <https://github.com/CODARcode/PerformanceAnalysis/blob/ckelly_develop/scripts>`_ . These scripts parse the **SLURM_JOB_NODELIST** environment variable and generates the nodelist for the services and application. They differ only in adhering to the node naming convention for that particular machine. To use:
183
183
184
184
.. code:: bash
185
185
186
-
service_node=$(./get_nodes.pl HEAD)
187
-
body_nodelist=$(./get_nodes.pl BODY)
186
+
service_node=$(path_to_script/get_nodes.pl HEAD)
187
+
body_nodelist=$(path_to_script/get_nodes.pl BODY)
188
188
189
189
We can now set the various :code:`<LAUNCH ..>` commands in the section above:
190
190
@@ -203,6 +203,30 @@ Where
203
203
204
204
Note that we have assigned 1 core to each rank of the AD, and so :code:`${n_mpi_ranks_per_node} * (${n_cores_per_rank_main} + 1)` should not exceed 64, the number of available cores.
205
205
206
+
Running with the CXI network provider on Frontier/Crusher
Frontier/Crusher and other machines with Cray HPE Slingshot network support an optimized communications provider, **cxi**. Using this requires a few extra steps when running Chimbuko in order to allow the Mochi (provenance database) components to communicate between processes launched under different calls to *srun* (i.e. between our services and clients).
210
+
211
+
First, add the following slurm options to your batch script header section:
212
+
213
+
.. code:: bash
214
+
215
+
#SBATCH --network=single_node_vni,job_vni
216
+
217
+
Then, in the *chimbuko_config.sh*, set the following options (in addition to any other optional arguments):
218
+
219
+
.. code:: bash
220
+
221
+
provdb_engine="cxi"
222
+
provdb_extra_args="-db_mercury_auth_key 0:0"
223
+
commit_extra_args="-provdb_mercury_auth_key 0:0"#add this variable if it doesn't yet exist in the setup script
224
+
pserver_extra_args="-provdb_mercury_auth_key 0:0"
225
+
ad_extra_args="-provdb_mercury_auth_key 0:0"
226
+
227
+
Alternatively, if Chimbuko's services and online AD components are launched together using the new, experimental launch procedure (see below), it is only necessary to set the *provdb_engine* option.
228
+
229
+
206
230
Scaling to large job sizes
207
231
^^^^^^^^^^^^^^^^^^^^^^^^^^
208
232
@@ -284,6 +308,43 @@ Online analysis of a non-MPI application with a non-MPI installation of Chimbuko
284
308
285
309
In the context of a non-MPI application, instances of the application must still be associated with an index within Chimbuko that allows for their discrimination. This proceeds much as in the previous section, but with a catch: by default Chimbuko assumes that the instance index passed in by the **-rank <rank>** option matches the rank index reflected by the trace data and the ADIOS trace filename produced by Tau. However for a non-MPI application, Tau assigns rank 0 to **all instances**. In order to communicate this to Chimbuko a second command line option must be used: **-override_rank 0**. Here the 0 tells Chimbuko that the input data is labeled as 0 in both the filename and the trace data. Chimbuko will then overwrite the rank index in the trace data to match that of its internal rank index to ensure that this new label is passed through the analysis. Note that the user must make sure that each application instance is assigned either a different **TAU_ADIOS2_PATH** or **TAU_ADIOS2_FILE_PREFIX** otherwise the trace data files will overwrite each other.
286
310
311
+
Launching Chimbuko's components together through a single script (advanced, experimental)
In order to simplify the launch procedure we are developing a script that simultaneously instantiates the Chimbuko services and the AD clients. At present we support only the Slurm task manager, and this feature is experimental.
315
+
316
+
To use it, download the *PerformanceAnalysis* source. Then, in the run script remove the two separate calls to *srun* and the lines associated with the extra gathering of the body and head nodes, and replace with the following:
while [ !-f chimbuko/vars/chimbuko_ad_cmdline.var ];do sleep 1;done
330
+
331
+
where **tasks_per_node** is the number of application tasks that you will be launching. It is assumed that the total number of nodes remains one larger than the number of nodes on which the application is to be launched.
332
+
333
+
As the services are launched here on the *last* node, a simple call to srun for the application suffices to co-locate the application ranks with the AD instances:
The *chimbuko.sh* script has an optional argument **--core_bind** to bind the AD processes to specific cores, which can be used alongside Slurm's binding options to ensure the AD instances run on separate resources to the application. The format of the argument is a comma-separated list of core indices *per task on any given node*, with those lists themselves separated by colons (:). For example, with **${tasks_per_node}=8**
0 commit comments