You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: sphinx/source/install_usage/install.rst
+65-1Lines changed: 65 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -51,7 +51,7 @@ Details of how to choose the libfabrics provider used by Mercury can be found :r
51
51
Integrating with system-installed MPI
52
52
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
53
53
54
-
Chimbuko requires an installation of MPI. While Spack can install MPI automatically as a dependency of Chimbuko, in most cases it is desirable to utilize the system installation. Instructions on configuring Spack to use external dependencies can be found `here <https://spack.readthedocs.io/en/latest/build_settings.html#external-packages>`_ . The simplest approach in general is to edit (create) a **packages.yaml** in one of Spack's search paths, e.g. :code:`~/.spack/packages.yaml`, with the following content:
54
+
Chimbuko by default requires an installation of MPI. While Spack can install MPI automatically as a dependency of Chimbuko, in most cases it is desirable to utilize the system installation. Instructions on configuring Spack to use external dependencies can be found `here <https://spack.readthedocs.io/en/latest/build_settings.html#external-packages>`_ . The simplest approach in general is to edit (create) a **packages.yaml** in one of Spack's search paths, e.g. :code:`~/.spack/packages.yaml`, with the following content:
55
55
56
56
.. code:: yaml
57
57
@@ -64,6 +64,17 @@ Chimbuko requires an installation of MPI. While Spack can install MPI automatica
64
64
65
65
Modified as necessary to point to your installation.
66
66
67
+
Non-MPI installation (advanced)
68
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
69
+
70
+
Chimbuko can be built without MPI by disabling the **mpi** Spack variant as follows:
When used in this mode the user is responsible for manually assigning a "rank" index to each instance of the online AD module, and also for ensuring that an instance of this module is created alongside each instance or rank of the target application (e.g. using a wrapper script that is launched via mpirun). We discuss how this can be achieved :ref:`here <non_mpi_run>`.
77
+
67
78
Summit
68
79
~~~~~~
69
80
@@ -104,7 +115,60 @@ Once installed, simply
104
115
spack load tau chimbuko-performance-analysis chimbuko-visualization2
105
116
106
117
after loading the modules above.
118
+
119
+
120
+
Spock
121
+
~~~~~~
122
+
123
+
In the PerformanceAnalysis source we also provide a Spack environment yaml for use on Spock, :code:`spack/environments/spock.yaml`. This environment is designed for the AMD compiler suite with Rocm 4.3.0. Installation instructions follow:
124
+
125
+
First download the Chimbuko and Mochi repositories:
Copy the file :code:`spack/environments/spock.yaml` from the PerformanceAnalysis git repository to a convenient location and edit the paths in the :code:`repos` section to point to the paths at which you downloaded the repositories:
Copy file name to clipboardExpand all lines: sphinx/source/install_usage/run_chimbuko.rst
+60-1Lines changed: 60 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -44,7 +44,8 @@ A number of variables in **chimbuko_config.sh** are marked :code:`<------------
44
44
- **TAU_PYTHON** : This specifies how to execute tau_python.
45
45
- **TAU_MAKEFILE** : The Tau Makefile. For spack users this variable is set by Spack when loading Tau and this line can be commented out.
46
46
- **export EXE_NAME=<name>** : This specifies the name of the executable (without full path). Replace **<name>** with an actual name of the application executable.
47
-
47
+
- **export FI_UNIVERSE_SIZE=<number>** : Libfabric (used by the provDB) requires knowledge of how many clients are to be expected. For optimal performance this should be set equal or larger than the number of ranks.
48
+
48
49
A full list of variables along with their description is provided in the `Appendix Section <../appendix/appendix_usage.html#chimbuko-config>`_, and more guidance is also provided in the template script.
49
50
50
51
Next, in the run script, export the config script as follows:
@@ -111,6 +112,64 @@ Chimbuko can be run to perform offline analysis of the application by changing c
111
112
112
113
------------------------------
113
114
115
+
Scaling to large job sizes
116
+
^^^^^^^^^^^^^^^^^^^^^^^^^^
117
+
118
+
Chimbuko supports runs with many thousands of MPI ranks. However achieving optimal performance of Chimbuko in this context can require some tuning of parameters in the *chimbuko_config.sh*. Firstly, ensure
119
+
120
+
- **FI_UNIVERSE_SIZE** is set larger than the number of ranks.
121
+
- Communication with the provDB (**provdb_engine** in the config) should be performed over the optimal OpenFabrics transport, i.e. *verbs* for Summit.
122
+
123
+
If the provenance database is taking a long time to drain its input buffers at the end of the job it typically means the database was overloaded and was not able to keep up with the volume of data. The provDB can be scaled in two ways:
124
+
125
+
- **provdb_nshards** increases the number of independent database shards that can be written to in parallel.
126
+
- **provdb_ninstances** controls the number of independent instances of the server exist
127
+
128
+
Increasing the number of shards should be the first option that is attempted. Each shard is managed by a separate Argobots execution stream and will run in parallel providing enough hardware threads are available to the services.
129
+
130
+
If increasing the number of shards is not sufficient, more provDB server instances can be run on further nodes, allowing indefinite scaling. However at present the built-in Chimbuko **run_services.sh** script can only support launching multiple provDB instances in the same resource set; for running servers on different resource sets the user must launch them manually with an appropriate job script. The **provdb_ninstances** variable must also be set to inform the other services components to coordinate with multiple server instances.
131
+
132
+
An example of running two different server instances on different nodes of Summit, for a run of our benchmark with 4032 ranks can be found in the *scripts/summit/provdb_multiinstance* subdirectory of the PerformanceAnalysis. The benchmark source can be found in the *benchmark_suite/benchmark_provdb* subdirectory.
133
+
134
+
135
+
136
+
.. _non_mpi_run:
137
+
138
+
Online analysis of an MPI application with a non-MPI installation of Chimbuko (advanced)
It is possible to use a non-MPI build of Chimbuko to analyze an MPI application. Indeed this is the only option for systems with job managers that do not allow tasks launched using different calls to mpirun (or equivalent) to occupy the same node.
142
+
143
+
There are two aspects to this that differ from a normal run of Chimbuko:
144
+
145
+
- The instances of the online AD 'driver' must be launched alongside the ranks of the application. This can be achieved by creating a wrapper script that instantiates both the driver and the application, and launching this script using mpirun.
146
+
- The driver instances must be manually provided with the application rank index to which they are to attach.
147
+
148
+
The assignment of a rank can be achieved using the **-rank <rank>** command line option of the driver component. Unfortunately this prevents the usage of the auto-generated AD run command that is output by the services script; instead the user must launch the driver manually in the wrapper script:
149
+
150
+
.. code:: bash
151
+
152
+
driver ${TAU_ADIOS2_ENGINE}${TAU_ADIOS2_PATH}${TAU_ADIOS2_FILE_PREFIX}-${EXE_NAME}${ad_opts} -rank ${rank}2>&1| tee chimbuko/logs/ad.${rank}.log
153
+
154
+
Here the first four variables are set by sourcing the *chimbuko_config.sh* script that the user provides. The variable **ad_opts** should be assigned to the contents of the *chimbuko/vars/chimbuko_ad_opts.var* file that is generated by the services script (this variable contains the various commands required for the driver to attach to the services). Finally the rank must be obtained from the appropriate environment variable set by the mpirun variant, for example
155
+
156
+
.. code:: bash
157
+
158
+
rank=${OMPI_COMM_WORLD_RANK}
159
+
160
+
An example is provided for the **func_multimodal** mini-app in the Chimbuko PerformanceAnalysis repository:
161
+
162
+
.. code:: bash
163
+
164
+
benchmark_suite/func_multimodal/run_nompi.sh
165
+
benchmark_suite/func_multimodal/wrap_nompi.sh
166
+
167
+
Online analysis of a non-MPI application with a non-MPI installation of Chimbuko (advanced)
In the context of a non-MPI application, instances of the application must still be associated with an index within Chimbuko that allows for their discrimination. This proceeds much as in the previous section, but with a catch: by default Chimbuko assumes that the instance index passed in by the **-rank <rank>** option matches the rank index reflected by the trace data and the ADIOS trace filename produced by Tau. However for a non-MPI application, Tau assigns rank 0 to **all instances**. In order to communicate this to Chimbuko a second command line option must be used: **-override_rank 0**. Here the 0 tells Chimbuko that the input data is labeled as 0 in both the filename and the trace data. Chimbuko will then overwrite the rank index in the trace data to match that of its internal rank index to ensure that this new label is passed through the analysis. Note that the user must make sure that each application instance is assigned either a different **TAU_ADIOS2_PATH** or **TAU_ADIOS2_FILE_PREFIX** otherwise the trace data files will overwrite each other.
0 commit comments