Skip to content

Commit 77ce3dd

Browse files
author
sandeepmittal
committed
Updated docs with command description and appendix
1 parent e1d3012 commit 77ce3dd

2 files changed

Lines changed: 54 additions & 23 deletions

File tree

sphinx/source/appendix/appendix_usage.rst

Lines changed: 28 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,17 @@
22
Usage
33
*********
44

5+
Additional ProvDB Variables
6+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
7+
8+
- **-db_write_dir** : This is used to specify a path to provenance database to write on disk.
9+
- **-engine** : This is the OFI libfabric provider used for the Mochi stack. Its value can be set to "ofi+tcp;ofi_rxm".
10+
511
Visualization Variables
612
~~~~~~~~~~~~~~~~~~~~~~~
713

14+
- **${provdb_writedir}** : A directory which stores provenance database
15+
- **${provdb_nshards}** : Number of shards used between provenance database and visualization module.
816
- **${VIZ_PORT}** : The port to assign to the visualization module
917
- **${VIZ_DATA_DIR}**: A directory for storing logs and temporary data (assumed to exist)
1018
- **${VIZ_INSTALL_DIR}**: The directory where the visualization module is installed
@@ -14,22 +22,25 @@ Parameter Server Variables
1422

1523
- **PSERVER_NT** : The number of threads used to handle incoming communications from the AD modules
1624
- **PSERVER_LOGDIR** : A directory for logging output
17-
- **VIZ_ADDRESS** : Address of the visualization module (see above).
18-
- **PROVDB_ADDR**: The address of the provenance database (see above). This option enables the storing of the final globally-aggregated function profile information into the provenance database.
1925
- **PSERVER_ALG** : Set AD algorithm to use for online analysis: "sstd" or "hbos". Default value is "hbos".
2026

2127
Note that all the above are optional arguments, although if the **VIZ_ADDRESS** is not provided, no information will be sent to the webserver.
2228

29+
Additional pserver Variables
30+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
31+
32+
- **-ws_addr** : Address of the visualization module.
33+
- **-provdb_addr** : The address of the provenance database (see above). This option enables the storing of the final globally-aggregated function profile information into the provenance database.
34+
- **-prov_outputpath** : This is the path to the provenance database on disk.
35+
2336
AD Variables
2437
~~~~~~~~~~~~
2538

26-
- **RANKS** : The number of MPI ranks that the application will be run on
27-
- **ADIOS2_ENGINE** : The ADIOS2 communications engine. For online analysis this should be **SST** by default (an alternative, **BP4** is discussed below)
28-
- **ADIOS2_FILE_DIR** : The directory in which the ADIOS2 file is written (see below)
29-
- **ADIOS2_FILE_PREFIX** : The ADIOS2 file prefix (see below)
30-
- **PSERVER_ADDR**: The address of the parameter server from above.
31-
- **PROVDB_ADDR**: The address of the provenance database from above.
32-
- **NSHARDS**: The number of provenance database shards
39+
- **${ADIOS2_ENGINE}** : The ADIOS2 communications engine. For online analysis this should be **SST** by default (an alternative, **BP4** is discussed below)
40+
- **${ADIOS2_PATH}** : The directory in which the ADIOS2 file is written (see below)
41+
- **${ADIOS2_FILE_PREFIX}** : The ADIOS2 file prefix.
42+
- **${EXE_NAME}** : Name of the executable of application (see examples).
43+
- **${ad_opts}** : This is a collection of all other `arguments <./appendix_usage.html#additional-ad-variables>`_ required by AD module for its instantiation.
3344

3445
Additional AD Variables
3546
~~~~~~~~~~~~~~~~~~~~~~~
@@ -40,5 +51,11 @@ Additional AD Variables
4051
- **-program_idx** : For workflows with multiple component programs, a "program index" must be supplied to the AD instances attached to those processes.
4152
- **-rank** : By default the data rank assigned to an AD instance is taken from its MPI rank in MPI_COMM_WORLD. This rank is used to verify the incoming trace data. This option allows the user to manually set the rank index.
4253
- **-override_rank** : This option disables the data rank verification and instead overwrites the data rank of the incoming trace data with the data rank stored in the AD instance. The value supplied must be the original data rank (this is used to generate the correct trace filename).
43-
- **-ad_algorithm** : This is an option which sets AD algorithm to use for online analysis: "sstd" or "hbos". Default value is "hbos".
44-
- **-hbos_threshold** : This is the threshold to control density of detected anomalies used by HBOS algorithm. Its value ranges between 0 and 1. Default value is 0.99
54+
- **-ad_algorithm** : This sets the AD algorithm to use for online analysis: "sstd" or "hbos". Default value is "hbos".
55+
- **-hbos_threshold** : This sets the threshold to control density of detected anomalies used by HBOS algorithm. Its value ranges between 0 and 1. Default value is 0.99
56+
57+
Application Variables
58+
~~~~~~~~~~~~~~~~~~~~~
59+
60+
- **${APPLICATION}** : Application executable.
61+
- **${APPLICATION_ARGS}** : List of arguments specific to the application.

sphinx/source/install_usage/run_chimbuko.rst

Lines changed: 26 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -37,28 +37,30 @@ The user should also choose a port for the provenance database, to which we assi
3737

3838
.. code:: bash
3939
40-
provdb_admin ${HEAD_NODE_IP}:${PROVDB_PORT} -nshards ${NSHARDS} -nthreads ${NTHREADS} &
40+
provdb_admin ${HEAD_NODE_IP}:${PROVDB_PORT} -nshards ${NSHARDS} -nthreads ${NTHREADS} ${provdb_extra_args} &
4141
sleep 2
4242
4343
For performance reasons the provenance database is **sharded** and the writing is threaded allowing parallel writes to different shards. By default there is only a single shard and thread, but for larger jobs the user should specify more shards and threads using the **-nshards ${NSHARDS}** and **-nthreads ${NTHREADS}** options respectively. The number of shards must also be communicated to the AD module below. The optimal number of shards and threads will depend on the system characteristics, however the number of threads should be at least as large as the number of shards. We recommend that the user run our provided benchmark application and increase the number of each to minimize the round-trip latency.
4444

45+
**${provdb_extra_args}** is a variable that is used to provide any additional arguments to provdb_admin. A list of such additional variables can be found `here <../appendix/appendix_usage.html#additional-provdb-variables>`_.
46+
4547
The database will be written to disk into the directory from which the **provdb_admin** application is called, under the filename **provdb.${SHARD}.unqlite** where **${SHARD}** is the index of the database shard.
4648

4749
For use below we define the variable **PROVDB_ADDR=tcp://${HEAD_NODE_IP}:${PROVDB_PORT}**. For convenience, the **provdb_admin** application will write out a file **provider.address**, the contents of which can be used in place of manually defining this variable.
4850

4951
----------------------------------
5052

51-
The second step is to instantiate the visualization module:
53+
The second step is to instantiate the visualization module.
5254

5355
.. code:: bash
5456
5557
export DATABASE_URL="sqlite:///${VIZ_DATA_DIR}/main.sqlite"
5658
export ANOMALY_STATS_URL="sqlite:///${VIZ_DATA_DIR}/anomaly_stats.sqlite"
5759
export ANOMALY_DATA_URL="sqlite:///${VIZ_DATA_DIR}/anomaly_data.sqlite"
5860
export FUNC_STATS_URL="sqlite:///${VIZ_DATA_DIR}/func_stats.sqlite"
59-
export PROVENANCE_DB="${PWD}/"
61+
export PROVENANCE_DB="${provdb_writedir}"
6062
export PROVDB_ADDR=$(cat provider.address)
61-
export SHARDED_NUM=1
63+
export SHARDED_NUM=${provdb_nshards}
6264
export C_FORCE_ROOT=1 #REQUIRED FOR DOCKER IMAGES ONLY
6365
6466
cd ${VIZ_INSTALL_DIR}
@@ -89,25 +91,36 @@ For details on the installation and usage of the visualization module, please re
8991

9092
----------------------------------
9193

92-
The third step is to start the parameter server:
94+
The third step is to start the parameter server. This can be achieved by running the following **pserver** command.
9395

9496
.. code:: bash
9597
96-
pserver -nt ${PSERVER_NT} -logdir ${PSERVER_LOGDIR} -ws_addr ${VIZ_ADDRESS} -provdb_addr ${PROVDB_ADDR} -ad ${PSERVER_ALG} &
98+
pserver -nt ${PSERVER_NT} -logdir ${PSERVER_LOGDIR} -ad ${PSERVER_ALG} &
9799
sleep 2
98100
99-
Description of the variables can be found `here <../appendix/appendix_usage.html#parameter-server-variables>`_.
101+
Description of the variables can be found `here <../appendix/appendix_usage.html#parameter-server-variables>`_. **${ps_extra_args}** can be used to provide additional arguments to the pserver command, as described `here <../appendix/appendix_usage.html#additional-pserver-variables>`_.
100102

101103
The parameter server opens communications on TCP port 5559. For use below we define the variable **PSERVER_ADDR=${HEAD_NODE_IP}:5559**.
102104

103105
----------------------------------
104106

105-
The fourth step is to instantiate the AD modules:
107+
The provenance database, visualization module and parameter server are launched using the following **jsrun** command:
106108

107109
.. code:: bash
108110
109-
mpirun -n ${RANKS} driver ${ADIOS2_ENGINE} ${ADIOS2_FILE_DIR} ${ADIOS2_FILE_PREFIX} -pserver_addr ${PSERVER_ADDR} -provdb_addr ${PROVDB_ADDR} -nprovdb_shards ${NSHARDS} &
110-
sleep 2
111+
#Run the services
112+
jsrun ${SERVICES}
113+
114+
**${SERVICES}** is the path to a script which includes commands from the previously described first, second and third step, respectively. This command should successfully launch the provenance database, the visualization module, and the parameter server.
115+
116+
----------------------------------
117+
118+
Next, the AD module can be instantiated using **jsrun** command as follows:
119+
120+
.. code:: bash
121+
122+
jsrun -e prepended driver ${ADIOS2_ENGINE} ${ADIOS2_PATH} ${ADIOS2_FILE_PREFIX}-${EXE_NAME} ${ad_opts}
123+
sleep 2
111124
112125
Description of the variables can be found `here <../appendix/appendix_usage.html#ad-variables>`_.
113126

@@ -117,7 +130,7 @@ The **ADIOS2_ENGINE** can be chosen as either **SST** or **BP4**. The former use
117130

118131
In the above we have assumed that the provenance database is being used. However if this component is not in use, the AD will automatically output the provenance data as JSON documents "${STEP}.anomalies.json", "${STEP}.normalexecs.json" and "${STEP}.metadata.json" placed in the directory "${PROV_DIR}/${PROGRAM_IDX}/${RANK}", where STEP is the i/o step; PROGRAM_IDX is the program index; RANK is the rank of the AD instance; and PROV_DIR is set by default to the working directory but can specified manually using the optional argument -prov_outputpath (cf. below).
119132

120-
The AD module has a number of additional options that can be used to tune its behavior. The full list can be obtained by running **driver** without any arguments. However a few useful options are described `here <../appendix/appendix_usage.html#additional-ad-variables>`_.
133+
The AD module has a number of additional options that can be used to tune its behavior. The full list can be obtained by running **driver** without any arguments. However a few useful options are described `here <../appendix/appendix_usage.html#additional-ad-variables>`_. These are part of the **${ad_opts}** in the above command.
121134

122135
For debug purposes, the AD module can be made more verbose by setting the environment variable **CHIMBUKO_VERBOSE=1**.
123136

@@ -131,9 +144,10 @@ The final step is to instantiate the application
131144

132145
.. code:: bash
133146
134-
mpirun -n ${RANKS} ${APPLICATION} ${APPLICATION_ARGS}
147+
jsrun -e prepended ${APPLICATION} ${APPLICATION_ARGS}
135148
136149
Aside from interacting with the visualization module, once complete the user can also interact directly with the provenance database using the **provdb_query** tool as described below: :ref:`install_usage/run_chimbuko:Interacting with the Provenance Database`.
150+
Description of variables is provided `here <../appendix/appendix_usage.html#application-variables>`_.
137151

138152
Offline Analysis
139153
~~~~~~~~~~~~~~~~

0 commit comments

Comments
 (0)