Skip to content

Commit d8f0b04

Browse files
committed
Documentation:
Non-MPI launching for MPI and non-MPI applications Building on Spock Scalability including multiple provDB instances
1 parent 1354b6b commit d8f0b04

3 files changed

Lines changed: 126 additions & 3 deletions

File tree

sphinx/source/appendix/appendix_usage.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ Options for the provenance database:
2020
- **provdb_nshards** : Number of database shards
2121
- **provdb_engine** : The OFI libfabric provider used for the Mochi stack.
2222
- **provdb_port** : The port of the provenance database
23-
- **provdb_nthreads** : Number of worker threads; should be >= the number of shards
23+
- **provdb_ninstances** : Number of server instances (default 1)
2424

2525
Options for the parameter server:
2626

sphinx/source/install_usage/install.rst

Lines changed: 65 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ Details of how to choose the libfabrics provider used by Mercury can be found :r
5151
Integrating with system-installed MPI
5252
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
5353

54-
Chimbuko requires an installation of MPI. While Spack can install MPI automatically as a dependency of Chimbuko, in most cases it is desirable to utilize the system installation. Instructions on configuring Spack to use external dependencies can be found `here <https://spack.readthedocs.io/en/latest/build_settings.html#external-packages>`_ . The simplest approach in general is to edit (create) a **packages.yaml** in one of Spack's search paths, e.g. :code:`~/.spack/packages.yaml`, with the following content:
54+
Chimbuko by default requires an installation of MPI. While Spack can install MPI automatically as a dependency of Chimbuko, in most cases it is desirable to utilize the system installation. Instructions on configuring Spack to use external dependencies can be found `here <https://spack.readthedocs.io/en/latest/build_settings.html#external-packages>`_ . The simplest approach in general is to edit (create) a **packages.yaml** in one of Spack's search paths, e.g. :code:`~/.spack/packages.yaml`, with the following content:
5555

5656
.. code:: yaml
5757
@@ -64,6 +64,17 @@ Chimbuko requires an installation of MPI. While Spack can install MPI automatica
6464
6565
Modified as necessary to point to your installation.
6666

67+
Non-MPI installation (advanced)
68+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
69+
70+
Chimbuko can be built without MPI by disabling the **mpi** Spack variant as follows:
71+
72+
.. code:: bash
73+
74+
spack install chimbuko~mpi ^py-setuptools-scm+toml
75+
76+
When used in this mode the user is responsible for manually assigning a "rank" index to each instance of the online AD module, and also for ensuring that an instance of this module is created alongside each instance or rank of the target application (e.g. using a wrapper script that is launched via mpirun). We discuss how this can be achieved :ref:`here <non_mpi_run>`.
77+
6778
Summit
6879
~~~~~~
6980

@@ -104,7 +115,60 @@ Once installed, simply
104115
spack load tau chimbuko-performance-analysis chimbuko-visualization2
105116
106117
after loading the modules above.
118+
119+
120+
Spock
121+
~~~~~~
122+
123+
In the PerformanceAnalysis source we also provide a Spack environment yaml for use on Spock, :code:`spack/environments/spock.yaml`. This environment is designed for the AMD compiler suite with Rocm 4.3.0. Installation instructions follow:
124+
125+
First download the Chimbuko and Mochi repositories:
126+
127+
.. code:: bash
128+
129+
git clone https://github.com/mochi-hpc/mochi-spack-packages.git
130+
git clone https://github.com/CODARcode/PerformanceAnalysis.git
131+
132+
Copy the file :code:`spack/environments/spock.yaml` from the PerformanceAnalysis git repository to a convenient location and edit the paths in the :code:`repos` section to point to the paths at which you downloaded the repositories:
133+
134+
.. code:: yaml
135+
136+
repos:
137+
- /autofs/nccs-svm1_home1/ckelly/install/mochi-spack-packages
138+
- /autofs/nccs-svm1_home1/ckelly/src/AD/PerformanceAnalysis/spack/repo/chimbuko
139+
140+
This environment uses the following modules, which must be loaded prior to installation and running:
141+
142+
.. code:: bash
143+
144+
module reset
145+
module load PrgEnv-amd/8.2.0
146+
module load rocm/4.3.0
147+
module load cray-python/3.9.4.1
148+
149+
To install the environment:
150+
151+
.. code:: bash
152+
153+
spack env create my_chimbuko_env spock.yaml
154+
spack env activate my_chimbuko_env
155+
spack install
156+
157+
Unfortunately at present there are a few issues with Spack on Spock that require workarounds when loading the environment:
158+
159+
.. code:: bash
160+
161+
#Looks like spack doesn't pick up cray-xpmem pkg-config loc, put at end so only use as last resort
162+
export PKG_CONFIG_PATH=${PKG_CONFIG_PATH}:/usr/lib64/pkgconfig
163+
164+
#Looks like spack misses an rpath for Chimbuko
165+
export LD_LIBRARY_PATH=/opt/cray/pe/libsci/21.08.1.2/AMD/4.0/x86_64/lib:${LD_LIBRARY_PATH}
107166
167+
spack env activate my_chimbuko_env
168+
spack load tau chimbuko-performance-analysis chimbuko-visualization2
169+
170+
171+
108172
.. _ADIOS2: https://github.com/ornladios/ADIOS2
109173
.. _ZeroMQ: https://zeromq.org/
110174
.. _CURL: https://curl.haxx.se/

sphinx/source/install_usage/run_chimbuko.rst

Lines changed: 60 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,8 @@ A number of variables in **chimbuko_config.sh** are marked :code:`<------------
4444
- **TAU_PYTHON** : This specifies how to execute tau_python.
4545
- **TAU_MAKEFILE** : The Tau Makefile. For spack users this variable is set by Spack when loading Tau and this line can be commented out.
4646
- **export EXE_NAME=<name>** : This specifies the name of the executable (without full path). Replace **<name>** with an actual name of the application executable.
47-
47+
- **export FI_UNIVERSE_SIZE=<number>** : Libfabric (used by the provDB) requires knowledge of how many clients are to be expected. For optimal performance this should be set equal or larger than the number of ranks.
48+
4849
A full list of variables along with their description is provided in the `Appendix Section <../appendix/appendix_usage.html#chimbuko-config>`_, and more guidance is also provided in the template script.
4950

5051
Next, in the run script, export the config script as follows:
@@ -111,6 +112,64 @@ Chimbuko can be run to perform offline analysis of the application by changing c
111112

112113
------------------------------
113114

115+
Scaling to large job sizes
116+
^^^^^^^^^^^^^^^^^^^^^^^^^^
117+
118+
Chimbuko supports runs with many thousands of MPI ranks. However achieving optimal performance of Chimbuko in this context can require some tuning of parameters in the *chimbuko_config.sh*. Firstly, ensure
119+
120+
- **FI_UNIVERSE_SIZE** is set larger than the number of ranks.
121+
- Communication with the provDB (**provdb_engine** in the config) should be performed over the optimal OpenFabrics transport, i.e. *verbs* for Summit.
122+
123+
If the provenance database is taking a long time to drain its input buffers at the end of the job it typically means the database was overloaded and was not able to keep up with the volume of data. The provDB can be scaled in two ways:
124+
125+
- **provdb_nshards** increases the number of independent database shards that can be written to in parallel.
126+
- **provdb_ninstances** controls the number of independent instances of the server exist
127+
128+
Increasing the number of shards should be the first option that is attempted. Each shard is managed by a separate Argobots execution stream and will run in parallel providing enough hardware threads are available to the services.
129+
130+
If increasing the number of shards is not sufficient, more provDB server instances can be run on further nodes, allowing indefinite scaling. However at present the built-in Chimbuko **run_services.sh** script can only support launching multiple provDB instances in the same resource set; for running servers on different resource sets the user must launch them manually with an appropriate job script. The **provdb_ninstances** variable must also be set to inform the other services components to coordinate with multiple server instances.
131+
132+
An example of running two different server instances on different nodes of Summit, for a run of our benchmark with 4032 ranks can be found in the *scripts/summit/provdb_multiinstance* subdirectory of the PerformanceAnalysis. The benchmark source can be found in the *benchmark_suite/benchmark_provdb* subdirectory.
133+
134+
135+
136+
.. _non_mpi_run:
137+
138+
Online analysis of an MPI application with a non-MPI installation of Chimbuko (advanced)
139+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
140+
141+
It is possible to use a non-MPI build of Chimbuko to analyze an MPI application. Indeed this is the only option for systems with job managers that do not allow tasks launched using different calls to mpirun (or equivalent) to occupy the same node.
142+
143+
There are two aspects to this that differ from a normal run of Chimbuko:
144+
145+
- The instances of the online AD 'driver' must be launched alongside the ranks of the application. This can be achieved by creating a wrapper script that instantiates both the driver and the application, and launching this script using mpirun.
146+
- The driver instances must be manually provided with the application rank index to which they are to attach.
147+
148+
The assignment of a rank can be achieved using the **-rank <rank>** command line option of the driver component. Unfortunately this prevents the usage of the auto-generated AD run command that is output by the services script; instead the user must launch the driver manually in the wrapper script:
149+
150+
.. code:: bash
151+
152+
driver ${TAU_ADIOS2_ENGINE} ${TAU_ADIOS2_PATH} ${TAU_ADIOS2_FILE_PREFIX}-${EXE_NAME} ${ad_opts} -rank ${rank} 2>&1 | tee chimbuko/logs/ad.${rank}.log
153+
154+
Here the first four variables are set by sourcing the *chimbuko_config.sh* script that the user provides. The variable **ad_opts** should be assigned to the contents of the *chimbuko/vars/chimbuko_ad_opts.var* file that is generated by the services script (this variable contains the various commands required for the driver to attach to the services). Finally the rank must be obtained from the appropriate environment variable set by the mpirun variant, for example
155+
156+
.. code:: bash
157+
158+
rank=${OMPI_COMM_WORLD_RANK}
159+
160+
An example is provided for the **func_multimodal** mini-app in the Chimbuko PerformanceAnalysis repository:
161+
162+
.. code:: bash
163+
164+
benchmark_suite/func_multimodal/run_nompi.sh
165+
benchmark_suite/func_multimodal/wrap_nompi.sh
166+
167+
Online analysis of a non-MPI application with a non-MPI installation of Chimbuko (advanced)
168+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
169+
170+
In the context of a non-MPI application, instances of the application must still be associated with an index within Chimbuko that allows for their discrimination. This proceeds much as in the previous section, but with a catch: by default Chimbuko assumes that the instance index passed in by the **-rank <rank>** option matches the rank index reflected by the trace data and the ADIOS trace filename produced by Tau. However for a non-MPI application, Tau assigns rank 0 to **all instances**. In order to communicate this to Chimbuko a second command line option must be used: **-override_rank 0**. Here the 0 tells Chimbuko that the input data is labeled as 0 in both the filename and the trace data. Chimbuko will then overwrite the rank index in the trace data to match that of its internal rank index to ensure that this new label is passed through the analysis. Note that the user must make sure that each application instance is assigned either a different **TAU_ADIOS2_PATH** or **TAU_ADIOS2_FILE_PREFIX** otherwise the trace data files will overwrite each other.
171+
172+
114173
.. _benchmark_suite:
115174

116175
Examples

0 commit comments

Comments
 (0)