Skip to content

Commit 4d291ef

Browse files
author
sandeepmittal
committed
Updated sphinx docs with HBOS
1 parent 3382904 commit 4d291ef

3 files changed

Lines changed: 57 additions & 41 deletions

File tree

sphinx/source/api/api_code.rst

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@ ADNormalEventProvenance
9292
.. doxygenfile:: ADNormalEventProvenance.hpp
9393
:project: api
9494
:path: ../../include/chimbuko/ad//ADNormalEventProvenance.hp
95-
95+
9696
ADOutlier
9797
---------
9898

@@ -128,7 +128,7 @@ AnomalyData
128128
:project: api
129129
:path: ../../include/chimbuko/ad//AnomalyData.hpp
130130

131-
131+
132132
ExecData
133133
--------
134134

@@ -156,6 +156,13 @@ ParamInterface
156156
:project: api
157157
:path: ../../../include/chimbuko/param.hpp
158158

159+
HbosParam
160+
---------
161+
162+
.. doxygenfile:: hbos_param.hpp
163+
:project: api
164+
:path: ../../../include/chimbuko/param/hbos_param.hpp
165+
159166
SstdParam
160167
---------
161168

@@ -204,15 +211,15 @@ PSProvenanceDBclient
204211
.. doxygenfile:: PSProvenanceDBclient.hpp
205212
:project: api
206213
:path: ../../include/chimbuko/pserver/PSProvenanceDBclient.hpp
207-
214+
208215
PSstatSender
209216
------------
210217

211218
.. doxygenfile:: PSstatSender.hpp
212219
:project: api
213220
:path: ../../include/chimbuko/pserver/PSstatSender.hpp
214221

215-
222+
216223
Network
217224
~~~~~~~
218225

@@ -246,7 +253,7 @@ ZMQMENet
246253
:project: api
247254
:path: ../../include/chimbuko/net/zmqme_net.hpp
248255

249-
256+
250257
Message
251258
~~~~~~~
252259

@@ -300,7 +307,7 @@ error
300307
.. doxygenfile:: error.hpp
301308
:project: api
302309
:path: ../../include/chimbuko/util//error.hpp
303-
310+
304311
hash
305312
----
306313

@@ -321,7 +328,7 @@ memutils
321328
.. doxygenfile:: memutils.hpp
322329
:project: api
323330
:path: ../../include/chimbuko/util//memutils.hpp
324-
331+
325332
mtQueue
326333
-------
327334

@@ -370,7 +377,7 @@ time
370377
.. doxygenfile:: time.hpp
371378
:project: api
372379
:path: ../../include/chimbuko/util//time.hpp
373-
380+
374381
verbose
375382
-------
376383

sphinx/source/install_usage/run_chimbuko.rst

Lines changed: 27 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -90,14 +90,14 @@ Where the variables are as follows:
9090
Henceforth we assign the variable **${VIZ_ADDRESS}=${HEAD_NODE_IP}:${VIZ_PORT}**.
9191

9292
For details on the installation and usage of the visualization module, please refer to the `readme <https://github.com/CODARcode/ChimbukoVisualizationII>`_.
93-
93+
9494
----------------------------------
9595

9696
The third step is to start the parameter server:
9797

9898
.. code:: bash
9999
100-
pserver -nt ${PSERVER_NT} -logdir ${PSERVER_LOGDIR} -ws_addr ${VIZ_ADDRESS} -provdb_addr ${PROVDB_ADDR} &
100+
pserver -nt ${PSERVER_NT} -logdir ${PSERVER_LOGDIR} -ws_addr ${VIZ_ADDRESS} -provdb_addr ${PROVDB_ADDR} -ad ${PSERVER_ALG} &
101101
sleep 2
102102
103103
Where the variables are as follows:
@@ -106,12 +106,13 @@ Where the variables are as follows:
106106
- **PSERVER_LOGDIR** : A directory for logging output
107107
- **VIZ_ADDRESS** : Address of the visualization module (see above).
108108
- **PROVDB_ADDR**: The address of the provenance database (see above). This option enables the storing of the final globally-aggregated function profile information into the provenance database.
109-
109+
- **PSERVER_ALG** : Set AD algorithm to use for online analysis: "sstd" or "hbos". Default value is "hbos".
110+
110111
Note that all the above are optional arguments, although if the **VIZ_ADDRESS** is not provided, no information will be sent to the webserver.
111112

112113
The parameter server opens communications on TCP port 5559. For use below we define the variable **PSERVER_ADDR=${HEAD_NODE_IP}:5559**.
113-
114-
----------------------------------
114+
115+
----------------------------------
115116

116117
The fourth step is to instantiate the AD modules:
117118

@@ -126,15 +127,15 @@ Where the variables are as follows:
126127
- **ADIOS2_ENGINE** : The ADIOS2 communications engine. For online analysis this should be **SST** by default (an alternative, **BP4** is discussed below)
127128
- **ADIOS2_FILE_DIR** : The directory in which the ADIOS2 file is written (see below)
128129
- **ADIOS2_FILE_PREFIX** : The ADIOS2 file prefix (see below)
129-
- **PSERVER_ADDR**: The address of the parameter server from above.
130+
- **PSERVER_ADDR**: The address of the parameter server from above.
130131
- **PROVDB_ADDR**: The address of the provenance database from above.
131132
- **NSHARDS**: The number of provenance database shards
132133

133134
The **ADIOS2_FILE_DIR** and **ADIOS2_FILE_PREFIX** arguments can be obtained by combining the **${TAU_ADIOS2_FILENAME}** environment variable with the name of the application. For example, for an application "main" and "TAU_ADIOS2_FILENAME=/path/to/tau-metrics", **ADIOS2_FILE_DIR=/path/to** and **ADIOS2_FILE_PREFIX=tau-metrics-main**. Note that if the environment variable is not set, the prefix will default to "tau-metrics" and the output placed in the current directory.
134135

135136
The **ADIOS2_ENGINE** can be chosen as either **SST** or **BP4**. The former uses RDMA and should be the default choice. However we have observed that in some cases the **BP4** option (available in ADIOS2 2.6+), which writes the traces to disk rather than to memory, can reduce the overhead of running Chimbuko alongside the application. Note however that BP4 mode can interfere with disk I/O-heavy components of the main application and so local burst buffers (e.g. Summit's NVME) should be used if necessary.
136137

137-
In the above we have assumed that the provenance database is being used. However if this component is not in use, the AD will automatically output the provenance data as JSON documents "${STEP}.anomalies.json", "${STEP}.normalexecs.json" and "${STEP}.metadata.json" placed in the directory "${PROV_DIR}/${PROGRAM_IDX}/${RANK}", where STEP is the i/o step; PROGRAM_IDX is the program index; RANK is the rank of the AD instance; and PROV_DIR is set by default to the working directory but can specified manually using the optional argument -prov_outputpath (cf. below).
138+
In the above we have assumed that the provenance database is being used. However if this component is not in use, the AD will automatically output the provenance data as JSON documents "${STEP}.anomalies.json", "${STEP}.normalexecs.json" and "${STEP}.metadata.json" placed in the directory "${PROV_DIR}/${PROGRAM_IDX}/${RANK}", where STEP is the i/o step; PROGRAM_IDX is the program index; RANK is the rank of the AD instance; and PROV_DIR is set by default to the working directory but can specified manually using the optional argument -prov_outputpath (cf. below).
138139

139140
The AD module has a number of additional options that can be used to tune its behavior. The full list can be obtained by running **driver** without any arguments. However a few useful options are described below:
140141

@@ -143,15 +144,17 @@ The AD module has a number of additional options that can be used to tune its be
143144
- **-anom_win_size** : The number of events around an anomalous function execution that are captured as contextual information and placed in the provenance database and displayed in the visualization (default 10)
144145
- **-program_idx** : For workflows with multiple component programs, a "program index" must be supplied to the AD instances attached to those processes.
145146
- **-rank** : By default the data rank assigned to an AD instance is taken from its MPI rank in MPI_COMM_WORLD. This rank is used to verify the incoming trace data. This option allows the user to manually set the rank index.
146-
- **-override_rank** : This option disables the data rank verification and instead overwrites the data rank of the incoming trace data with the data rank stored in the AD instance. The value supplied must be the original data rank (this is used to generate the correct trace filename).
147-
147+
- **-override_rank** : This option disables the data rank verification and instead overwrites the data rank of the incoming trace data with the data rank stored in the AD instance. The value supplied must be the original data rank (this is used to generate the correct trace filename).
148+
- **-ad_algorithm** : This is an option which sets AD algorithm to use for online analysis: "sstd" or "hbos". Default value is "hbos".
149+
- **-hbos_threshold** : This is the threshold to control density of detected anomalies used by HBOS algorithm. Its value ranges between 0 and 1. Default value is 0.99
150+
148151
For debug purposes, the AD module can be made more verbose by setting the environment variable **CHIMBUKO_VERBOSE=1**.
149152

150153
**Note**: For workflows with multiple different component executables, the AD instances must be provided with a program index such that the data is appropriately tagged.
151154

152155
**Note**: If a program is executed multiple times but without MPI, the 'rank' index of the data must be set manually by the AD. In this case the 'rank' becomes a way of indexing the different instances of the program. This can be achieved setting ***-rank ${DESIRED_RANK} -override_rank 0**, which will set the data rank to **${DESIRED_RANK}**. (The 0 provided to -override rank is because for non-MPI applications the rank assigned by Tau is always 0.)
153156

154-
----------------------------------
157+
----------------------------------
155158

156159
The final step is to instantiate the application
157160

@@ -160,7 +163,7 @@ The final step is to instantiate the application
160163
mpirun -n ${RANKS} ${APPLICATION} ${APPLICATION_ARGS}
161164
162165
Aside from interacting with the visualization module, once complete the user can also interact directly with the provenance database using the **provdb_query** tool as described below: :ref:`install_usage/run_chimbuko:Interacting with the Provenance Database`.
163-
166+
164167
Offline Analysis
165168
~~~~~~~~~~~~~~~~
166169

@@ -180,7 +183,7 @@ On the analysis machine, the provenance database and parameter server should be
180183
181184
mpirun -n ${RANKS} driver BPFile ${ADIOS2_FILE_DIR} ${ADIOS2_FILE_PREFIX} ${OUTPUT_LOC} -pserver_addr ${PSERVER_ADDR} -provdb_addr ${PROVDB_ADDR} ...
182185
183-
Note that the first argument of **driver**, which specifies the ADIOS2 engine, has been set to **BPFile**, and the process is not run in the background.
186+
Note that the first argument of **driver**, which specifies the ADIOS2 engine, has been set to **BPFile**, and the process is not run in the background.
184187

185188
Examples
186189
~~~~~~~~
@@ -203,7 +206,7 @@ For GPU workflows we presently have examples only for Nvidia GPUS:
203206
For convenience we provide docker images in which these examples can be run alongside the full Chimbuko stack. The CPU examples can be run as:
204207

205208
.. code:: bash
206-
209+
207210
docker pull chimbuko/run_examples:latest
208211
docker run --rm -it -p 5002:5002 --cap-add=SYS_PTRACE --security-opt seccomp=unconfined chimbuko/run_examples:latest
209212
@@ -212,11 +215,11 @@ And connect to this visualization server at **localhost:5002**.
212215
For the GPU examples the user must have access to a system with an installation of the NVidia CUDA driver and runtime compatible with CUDA 10.1 as well as a Docker installation configured to support the GPU. Internally we use the `nvidia-docker <https://github.com/NVIDIA/nvidia-docker>`_ tool to start the Docker images. To run,
213216

214217
.. code:: bash
215-
218+
216219
docker pull chimbuko/run_examples:latest-gpu
217220
nvidia-docker run -p 5002:5002 --cap-add=SYS_PTRACE --security-opt seccomp=unconfined chimbuko/run_examples:latest-gpu
218221
219-
And connect to this visualization server at **localhost:5002**.
222+
And connect to this visualization server at **localhost:5002**.
220223

221224
We also provide DockerFiles and run scripts for two real-world scientific applications described below:
222225

@@ -226,7 +229,7 @@ NWChem
226229
`NWChem <https://www.nwchem-sw.org/>`_ (Northwest Computational Chemistry Package) is the US DOE's premier massively parallel computational chemistry package, largely written in Fortran. We provide a `Docker image <https://hub.docker.com/r/chimbuko/run_nwchem>`_ demonstrating the coupling of an NWChem molecular dynamics simulation of the ethanol molecule with Chimbuko. To run the image:
227230

228231
.. code:: bash
229-
232+
230233
docker pull chimbuko/run_nwchem:latest
231234
docker run -p 5002:5002 --cap-add=SYS_PTRACE --security-opt seccomp=unconfined chimbuko/run_nwchem:latest
232235
@@ -240,7 +243,7 @@ The MOCU application is part of the `ExaLearn <https://github.com/exalearn>`_ pr
240243
To run the image the user must have access to a system with an installation of the NVidia CUDA driver and runtime compatible with CUDA 10.1 as well as a Docker installation configured to support the GPU. Internally we use the `nvidia-docker <https://github.com/NVIDIA/nvidia-docker>`_ tool to start the Docker images. To run:
241244

242245
.. code:: bash
243-
246+
244247
docker pull chimbuko/run_mocu:latest
245248
nvidia-docker run -p 5002:5002 --cap-add=SYS_PTRACE --security-opt seccomp=unconfined chimbuko/run_mocu:latest
246249
@@ -270,7 +273,7 @@ For the provenance database (provdb_admin) we recommend using the OFI "verbs" tr
270273
271274
jsrun -n 1 fi_info
272275
273-
within an interactive session, and searching for one that supports verbs. However the following setup has been verified:
276+
within an interactive session, and searching for one that supports verbs. However the following setup has been verified:
274277

275278
.. code:: bash
276279
@@ -303,7 +306,7 @@ Where the variables are as follows:
303306

304307
- **COLLECTION** : One of the three collections in the database, **anomalies**, **normalexecs**, **metadata** (cf :ref:`introduction/provdb:Provenance Database`).
305308
- **QUERY**: The query, format described below.
306-
309+
307310
The **QUERY** argument should be a jx9 function returning a bool and enclosed in quotation marks. It should be of the format
308311

309312
.. code:: bash
@@ -312,8 +315,8 @@ The **QUERY** argument should be a jx9 function returning a bool and enclosed in
312315
313316
314317
Alternatively the query can be set to "DUMP", which will output all entries.
315-
316-
The function is applied sequentially to each element of the collection. Inside the function the entry is described by the variable **$entry**. Note that the backslash-dollar (\\$) is necessary to prevent the shell from trying to expand the variable. Fields of **$entry** can be queried using the square-bracket notation with the field name inside. In the sketch above the field "some_field" is compared to a value **${SOME_VALUE}** (here representing a numerical value or a value expanded by the shell, *not* a jx9 variable!).
318+
319+
The function is applied sequentially to each element of the collection. Inside the function the entry is described by the variable **$entry**. Note that the backslash-dollar (\\$) is necessary to prevent the shell from trying to expand the variable. Fields of **$entry** can be queried using the square-bracket notation with the field name inside. In the sketch above the field "some_field" is compared to a value **${SOME_VALUE}** (here representing a numerical value or a value expanded by the shell, *not* a jx9 variable!).
317320

318321
Some examples:
319322

@@ -342,9 +345,9 @@ Where the variables are as follows:
342345

343346
- **COLLECTION** : One of the two collections in the database, **func_stats**, **counter_stats**.
344347
- **QUERY**: The query, format described below.
345-
348+
346349
The formatting of the **QUERY** argument is described above.
347-
350+
348351
Execute mode
349352
------------
350353

@@ -359,6 +362,6 @@ Where the variables are as follows:
359362
- **CODE** : The jx9 script
360363
- **VARIABLES** : a comma-separated list (without spaces) of the variables assigned by the script
361364

362-
The **CODE** argument is a complete jx9 script. As above, backslashes ('\') must be placed before internal '$' and '"' characters to prevent shell expansion.
365+
The **CODE** argument is a complete jx9 script. As above, backslashes ('\') must be placed before internal '$' and '"' characters to prevent shell expansion.
363366

364367
If the option **-from_file** is specified the **${CODE}** variable above will be treated as a filename from which to obtain the script. Note that in this case the backslashes before the special characters are not necessary.

0 commit comments

Comments
 (0)