You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: sphinx/source/install_usage/run_chimbuko.rst
+27-24Lines changed: 27 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -90,14 +90,14 @@ Where the variables are as follows:
90
90
Henceforth we assign the variable **${VIZ_ADDRESS}=${HEAD_NODE_IP}:${VIZ_PORT}**.
91
91
92
92
For details on the installation and usage of the visualization module, please refer to the `readme <https://github.com/CODARcode/ChimbukoVisualizationII>`_.
@@ -106,12 +106,13 @@ Where the variables are as follows:
106
106
- **PSERVER_LOGDIR** : A directory for logging output
107
107
- **VIZ_ADDRESS** : Address of the visualization module (see above).
108
108
- **PROVDB_ADDR**: The address of the provenance database (see above). This option enables the storing of the final globally-aggregated function profile information into the provenance database.
109
-
109
+
- **PSERVER_ALG** : Set AD algorithm to use for online analysis: "sstd" or "hbos". Default value is "hbos".
110
+
110
111
Note that all the above are optional arguments, although if the **VIZ_ADDRESS** is not provided, no information will be sent to the webserver.
111
112
112
113
The parameter server opens communications on TCP port 5559. For use below we define the variable **PSERVER_ADDR=${HEAD_NODE_IP}:5559**.
113
-
114
-
----------------------------------
114
+
115
+
----------------------------------
115
116
116
117
The fourth step is to instantiate the AD modules:
117
118
@@ -126,15 +127,15 @@ Where the variables are as follows:
126
127
- **ADIOS2_ENGINE** : The ADIOS2 communications engine. For online analysis this should be **SST** by default (an alternative, **BP4** is discussed below)
127
128
- **ADIOS2_FILE_DIR** : The directory in which the ADIOS2 file is written (see below)
128
129
- **ADIOS2_FILE_PREFIX** : The ADIOS2 file prefix (see below)
129
-
- **PSERVER_ADDR**: The address of the parameter server from above.
130
+
- **PSERVER_ADDR**: The address of the parameter server from above.
130
131
- **PROVDB_ADDR**: The address of the provenance database from above.
131
132
- **NSHARDS**: The number of provenance database shards
132
133
133
134
The **ADIOS2_FILE_DIR** and **ADIOS2_FILE_PREFIX** arguments can be obtained by combining the **${TAU_ADIOS2_FILENAME}** environment variable with the name of the application. For example, for an application "main" and "TAU_ADIOS2_FILENAME=/path/to/tau-metrics", **ADIOS2_FILE_DIR=/path/to** and **ADIOS2_FILE_PREFIX=tau-metrics-main**. Note that if the environment variable is not set, the prefix will default to "tau-metrics" and the output placed in the current directory.
134
135
135
136
The **ADIOS2_ENGINE** can be chosen as either **SST** or **BP4**. The former uses RDMA and should be the default choice. However we have observed that in some cases the **BP4** option (available in ADIOS2 2.6+), which writes the traces to disk rather than to memory, can reduce the overhead of running Chimbuko alongside the application. Note however that BP4 mode can interfere with disk I/O-heavy components of the main application and so local burst buffers (e.g. Summit's NVME) should be used if necessary.
136
137
137
-
In the above we have assumed that the provenance database is being used. However if this component is not in use, the AD will automatically output the provenance data as JSON documents "${STEP}.anomalies.json", "${STEP}.normalexecs.json" and "${STEP}.metadata.json" placed in the directory "${PROV_DIR}/${PROGRAM_IDX}/${RANK}", where STEP is the i/o step; PROGRAM_IDX is the program index; RANK is the rank of the AD instance; and PROV_DIR is set by default to the working directory but can specified manually using the optional argument -prov_outputpath (cf. below).
138
+
In the above we have assumed that the provenance database is being used. However if this component is not in use, the AD will automatically output the provenance data as JSON documents "${STEP}.anomalies.json", "${STEP}.normalexecs.json" and "${STEP}.metadata.json" placed in the directory "${PROV_DIR}/${PROGRAM_IDX}/${RANK}", where STEP is the i/o step; PROGRAM_IDX is the program index; RANK is the rank of the AD instance; and PROV_DIR is set by default to the working directory but can specified manually using the optional argument -prov_outputpath (cf. below).
138
139
139
140
The AD module has a number of additional options that can be used to tune its behavior. The full list can be obtained by running **driver** without any arguments. However a few useful options are described below:
140
141
@@ -143,15 +144,17 @@ The AD module has a number of additional options that can be used to tune its be
143
144
- **-anom_win_size** : The number of events around an anomalous function execution that are captured as contextual information and placed in the provenance database and displayed in the visualization (default 10)
144
145
- **-program_idx** : For workflows with multiple component programs, a "program index" must be supplied to the AD instances attached to those processes.
145
146
- **-rank** : By default the data rank assigned to an AD instance is taken from its MPI rank in MPI_COMM_WORLD. This rank is used to verify the incoming trace data. This option allows the user to manually set the rank index.
146
-
- **-override_rank** : This option disables the data rank verification and instead overwrites the data rank of the incoming trace data with the data rank stored in the AD instance. The value supplied must be the original data rank (this is used to generate the correct trace filename).
147
-
147
+
- **-override_rank** : This option disables the data rank verification and instead overwrites the data rank of the incoming trace data with the data rank stored in the AD instance. The value supplied must be the original data rank (this is used to generate the correct trace filename).
148
+
- **-ad_algorithm** : This is an option which sets AD algorithm to use for online analysis: "sstd" or "hbos". Default value is "hbos".
149
+
- **-hbos_threshold** : This is the threshold to control density of detected anomalies used by HBOS algorithm. Its value ranges between 0 and 1. Default value is 0.99
150
+
148
151
For debug purposes, the AD module can be made more verbose by setting the environment variable **CHIMBUKO_VERBOSE=1**.
149
152
150
153
**Note**: For workflows with multiple different component executables, the AD instances must be provided with a program index such that the data is appropriately tagged.
151
154
152
155
**Note**: If a program is executed multiple times but without MPI, the 'rank' index of the data must be set manually by the AD. In this case the 'rank' becomes a way of indexing the different instances of the program. This can be achieved setting ***-rank ${DESIRED_RANK} -override_rank 0**, which will set the data rank to **${DESIRED_RANK}**. (The 0 provided to -override rank is because for non-MPI applications the rank assigned by Tau is always 0.)
153
156
154
-
----------------------------------
157
+
----------------------------------
155
158
156
159
The final step is to instantiate the application
157
160
@@ -160,7 +163,7 @@ The final step is to instantiate the application
Aside from interacting with the visualization module, once complete the user can also interact directly with the provenance database using the **provdb_query** tool as described below: :ref:`install_usage/run_chimbuko:Interacting with the Provenance Database`.
163
-
166
+
164
167
Offline Analysis
165
168
~~~~~~~~~~~~~~~~
166
169
@@ -180,7 +183,7 @@ On the analysis machine, the provenance database and parameter server should be
Note that the first argument of **driver**, which specifies the ADIOS2 engine, has been set to **BPFile**, and the process is not run in the background.
186
+
Note that the first argument of **driver**, which specifies the ADIOS2 engine, has been set to **BPFile**, and the process is not run in the background.
184
187
185
188
Examples
186
189
~~~~~~~~
@@ -203,7 +206,7 @@ For GPU workflows we presently have examples only for Nvidia GPUS:
203
206
For convenience we provide docker images in which these examples can be run alongside the full Chimbuko stack. The CPU examples can be run as:
204
207
205
208
.. code:: bash
206
-
209
+
207
210
docker pull chimbuko/run_examples:latest
208
211
docker run --rm -it -p 5002:5002 --cap-add=SYS_PTRACE --security-opt seccomp=unconfined chimbuko/run_examples:latest
209
212
@@ -212,11 +215,11 @@ And connect to this visualization server at **localhost:5002**.
212
215
For the GPU examples the user must have access to a system with an installation of the NVidia CUDA driver and runtime compatible with CUDA 10.1 as well as a Docker installation configured to support the GPU. Internally we use the `nvidia-docker <https://github.com/NVIDIA/nvidia-docker>`_ tool to start the Docker images. To run,
213
216
214
217
.. code:: bash
215
-
218
+
216
219
docker pull chimbuko/run_examples:latest-gpu
217
220
nvidia-docker run -p 5002:5002 --cap-add=SYS_PTRACE --security-opt seccomp=unconfined chimbuko/run_examples:latest-gpu
218
221
219
-
And connect to this visualization server at **localhost:5002**.
222
+
And connect to this visualization server at **localhost:5002**.
220
223
221
224
We also provide DockerFiles and run scripts for two real-world scientific applications described below:
222
225
@@ -226,7 +229,7 @@ NWChem
226
229
`NWChem <https://www.nwchem-sw.org/>`_ (Northwest Computational Chemistry Package) is the US DOE's premier massively parallel computational chemistry package, largely written in Fortran. We provide a `Docker image <https://hub.docker.com/r/chimbuko/run_nwchem>`_ demonstrating the coupling of an NWChem molecular dynamics simulation of the ethanol molecule with Chimbuko. To run the image:
227
230
228
231
.. code:: bash
229
-
232
+
230
233
docker pull chimbuko/run_nwchem:latest
231
234
docker run -p 5002:5002 --cap-add=SYS_PTRACE --security-opt seccomp=unconfined chimbuko/run_nwchem:latest
232
235
@@ -240,7 +243,7 @@ The MOCU application is part of the `ExaLearn <https://github.com/exalearn>`_ pr
240
243
To run the image the user must have access to a system with an installation of the NVidia CUDA driver and runtime compatible with CUDA 10.1 as well as a Docker installation configured to support the GPU. Internally we use the `nvidia-docker <https://github.com/NVIDIA/nvidia-docker>`_ tool to start the Docker images. To run:
241
244
242
245
.. code:: bash
243
-
246
+
244
247
docker pull chimbuko/run_mocu:latest
245
248
nvidia-docker run -p 5002:5002 --cap-add=SYS_PTRACE --security-opt seccomp=unconfined chimbuko/run_mocu:latest
246
249
@@ -270,7 +273,7 @@ For the provenance database (provdb_admin) we recommend using the OFI "verbs" tr
270
273
271
274
jsrun -n 1 fi_info
272
275
273
-
within an interactive session, and searching for one that supports verbs. However the following setup has been verified:
276
+
within an interactive session, and searching for one that supports verbs. However the following setup has been verified:
274
277
275
278
.. code:: bash
276
279
@@ -303,7 +306,7 @@ Where the variables are as follows:
303
306
304
307
- **COLLECTION** : One of the three collections in the database, **anomalies**, **normalexecs**, **metadata** (cf :ref:`introduction/provdb:Provenance Database`).
305
308
- **QUERY**: The query, format described below.
306
-
309
+
307
310
The **QUERY** argument should be a jx9 function returning a bool and enclosed in quotation marks. It should be of the format
308
311
309
312
.. code:: bash
@@ -312,8 +315,8 @@ The **QUERY** argument should be a jx9 function returning a bool and enclosed in
312
315
313
316
314
317
Alternatively the query can be set to "DUMP", which will output all entries.
315
-
316
-
The function is applied sequentially to each element of the collection. Inside the function the entry is described by the variable **$entry**. Note that the backslash-dollar (\\$) is necessary to prevent the shell from trying to expand the variable. Fields of **$entry** can be queried using the square-bracket notation with the field name inside. In the sketch above the field "some_field" is compared to a value **${SOME_VALUE}** (here representing a numerical value or a value expanded by the shell, *not* a jx9 variable!).
318
+
319
+
The function is applied sequentially to each element of the collection. Inside the function the entry is described by the variable **$entry**. Note that the backslash-dollar (\\$) is necessary to prevent the shell from trying to expand the variable. Fields of **$entry** can be queried using the square-bracket notation with the field name inside. In the sketch above the field "some_field" is compared to a value **${SOME_VALUE}** (here representing a numerical value or a value expanded by the shell, *not* a jx9 variable!).
317
320
318
321
Some examples:
319
322
@@ -342,9 +345,9 @@ Where the variables are as follows:
342
345
343
346
- **COLLECTION** : One of the two collections in the database, **func_stats**, **counter_stats**.
344
347
- **QUERY**: The query, format described below.
345
-
348
+
346
349
The formatting of the **QUERY** argument is described above.
347
-
350
+
348
351
Execute mode
349
352
------------
350
353
@@ -359,6 +362,6 @@ Where the variables are as follows:
359
362
- **CODE** : The jx9 script
360
363
- **VARIABLES** : a comma-separated list (without spaces) of the variables assigned by the script
361
364
362
-
The **CODE** argument is a complete jx9 script. As above, backslashes ('\') must be placed before internal '$' and '"' characters to prevent shell expansion.
365
+
The **CODE** argument is a complete jx9 script. As above, backslashes ('\') must be placed before internal '$' and '"' characters to prevent shell expansion.
363
366
364
367
If the option **-from_file** is specified the **${CODE}** variable above will be treated as a filename from which to obtain the script. Note that in this case the backslashes before the special characters are not necessary.
0 commit comments