Skip to content

Commit ea53d6e

Browse files
committed
Merge branch 'sm_release' into ckelly_develop
2 parents 0cc046b + d0c41d8 commit ea53d6e

2 files changed

Lines changed: 209 additions & 153 deletions

File tree

sphinx/source/appendix/appendix_usage.rst

Lines changed: 172 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -2,34 +2,166 @@
22
Usage
33
*********
44

5+
Chimbuko Config
6+
~~~~~~~~~~~~~~~
7+
8+
Options for visualization module:
9+
10+
- **viz_root** : Path to the visualization module.
11+
- **viz_worker_port** : The port on which to run the redis server for the visualization backend.
12+
- **viz_port** : the port on which to run the webserver
13+
14+
General options for Chimbuko backend:
15+
16+
- **backend_root** : The root install directory of the PerformanceAnalysis libraries. If set to "infer" it will be inferred from the path of the executables
17+
18+
Options for the provenance database:
19+
20+
- **provdb_nshards** : Number of database shards
21+
- **provdb_engine** : The OFI libfabric provider used for the Mochi stack.
22+
- **provdb_port** : The port of the provenance database
23+
- **provdb_nthreads** : Number of worker threads; should be >= the number of shards
24+
25+
Options for the parameter server:
26+
27+
- **ad_win_size** : Number of events around an anomaly to store; provDB entry size is proportional to this
28+
- **ad_alg** : AD algorithm to use. "sstd" or "hbos"
29+
- **ad_outlier_sstd_sigma** : number of standard deviations that defines an outlier.
30+
- **ad_outlier_hbos_threshold** : The percentile of events outside of which are considered anomalies by the HBOS algorithm.
31+
32+
Options for TAU:
33+
34+
- **export TAU_ADIOS2_ENGINE=${value}** : Online communication engine (recommended SST, but alternative BP4 although this goes through the disk system and may be slower unless the BPfiles are stored on a burst disk)
35+
- **export TAU_ADIOS2_ONE_FILE=FALSE** : a different connection file for each rank
36+
- **export TAU_ADIOS2_PERIODIC=1** : enable/disable ADIOS2 periodic output
37+
- **export TAU_ADIOS2_PERIOD=1000000** : period in us between ADIOS2 io steps
38+
- **export TAU_THREAD_PER_GPU_STREAM=1** : force GPU streams to appear as different TAU virtual threads
39+
- **export TAU_THROTTLE=1** : enable/disable throttling of short-running functions
40+
- **TAU_ADIOS2_PATH** : path where the adios2 files are to be stored. Chimbuko services creates the directory chimbuko/adios2 in the working directory and this should be used by default
41+
- **TAU_ADIOS2_FILE_PREFIX** : the prefix of tau adios2 files; full filename is ${TAU_ADIOS2_PREFIX}-${EXE_NAME}-${RANK}.bp
42+
43+
Launch Services
44+
~~~~~~~~~~~~~~~
45+
46+
Description of running the Chimbuko head node Services:
47+
First, This script sources the chimbuko config script with variables defined in `previous section <./appendix_usage.html#chimbuko-config>`_.
48+
49+
Next, it instantiates provenance database using the following command:
50+
51+
.. code:: bash
52+
53+
provdb_admin "${provdb_addr}" -engine ${provdb_engine} -nshards ${provdb_nshards} -nthreads ${provdb_nthreads} -db_write_dir ${provdb_writedir}
54+
55+
where **${provdb_addr}** is address of provenance database and other variables are defined `here <../appendix/appendix_usage.html#additional-provdb-variables>`_.
56+
57+
Next, the following commands instantiates visualization module:
58+
59+
.. code:: bash
60+
61+
export SHARDED_NUM=${provdb_nshards}
62+
export PROVDB_ADDR=${prov_add}
63+
64+
export SERVER_CONFIG="production"
65+
export DATABASE_URL="sqlite:///${viz_dir}/main.sqlite"
66+
export ANOMALY_STATS_URL="sqlite:///${viz_dir}/anomaly_stats.sqlite"
67+
export ANOMALY_DATA_URL="sqlite:///${viz_dir}/anomaly_data.sqlite"
68+
export FUNC_STATS_URL="sqlite:///${viz_dir}/func_stats.sqlite"
69+
export PROVENANCE_DB=${provdb_writedir}
70+
export CELERY_BROKER_URL="redis://${HOST}:${viz_worker_port}"
71+
72+
#Setup redis
73+
cp -r $viz_root/redis-stable/redis.conf .
74+
sed -i "s|^dir ./|dir ${viz_dir}/|" redis.conf
75+
sed -i "s|^bind 127.0.0.1|bind 0.0.0.0|" redis.conf
76+
sed -i "s|^daemonize no|daemonize yes|" redis.conf
77+
sed -i "s|^pidfile /var/run/redis_6379.pid|pidfile ${viz_dir}/redis.pid|" redis.conf
78+
sed -i "s|^logfile "\"\""|logfile ${log_dir}/redis.log|" redis.conf
79+
sed -i "s|.*syslog-enabled no|syslog-enabled yes|" redis.conf
80+
81+
echo "==========================================="
82+
echo "Chimbuko Services: Launch Chimbuko visualization server"
83+
echo "==========================================="
84+
cd ${viz_root}
85+
86+
echo "Chimbuko Services: create db ..."
87+
python3 manager.py createdb
88+
89+
echo "Chimbuko Services: run redis ..."
90+
redis-server ${viz_dir}/redis.conf
91+
sleep 5
92+
93+
echo "Chimbuko Services: run celery ..."
94+
CELERY_ARGS="--loglevel=info --concurrency=1"
95+
python3 manager.py celery ${CELERY_ARGS} 2>&1 | tee "${log_dir}/celery.log" &
96+
sleep 10
97+
98+
echo "Chimbuko Services: run webserver ..."
99+
python3 run_server.py $HOST $viz_port 2>&1 | tee "${log_dir}/webserver.log" &
100+
sleep 2
101+
102+
echo "Chimbuko Services: redis ping-pong ..."
103+
redis-cli -h $HOST -p ${viz_worker_port} ping
104+
105+
cd ${base}
106+
107+
ws_addr="http://${HOST}:${viz_port}/api/anomalydata"
108+
ps_extra_args+=" -ws_addr ${ws_addr}"
109+
110+
echo $HOST > ${var_dir}/chimbuko_webserver.host
111+
echo $viz_port > ${var_dir}/chimbuko_webserver.port
112+
113+
114+
After visualization module (its variables are described `here <./appendix_usage.html#parameter-server-variables>`_) is successfully instantiated, the parameter server is launched as part of Chimbuko services
115+
116+
.. code:: bash
117+
118+
pserver -ad ${pserver_alg} -nt ${pserver_nt} -logdir ${log_dir} -port ${pserver_port} ${ps_extra_args}
119+
120+
The parameter server command line variables used as input for **pserver** command are described `here <../appendix/appendix_usage.html#parameter-server-variables>`_.
121+
122+
Additional ProvDB Variables
123+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
124+
125+
- **-nthreads** : Number of threads used by provenance database
126+
- **-nshards** : Number of shards used by provenance database
127+
- **-db_write_dir** : This is used to specify a path to provenance database to write on disk.
128+
- **-engine** : This is the OFI libfabric provider used for the Mochi stack. Its value can be set to "ofi+tcp;ofi_rxm".
129+
5130
Visualization Variables
6131
~~~~~~~~~~~~~~~~~~~~~~~
7132

133+
- **${provdb_writedir}** : A directory which stores provenance database
134+
- **${provdb_nshards}** : Number of shards used between provenance database and visualization module.
8135
- **${VIZ_PORT}** : The port to assign to the visualization module
9136
- **${VIZ_DATA_DIR}**: A directory for storing logs and temporary data (assumed to exist)
10137
- **${VIZ_INSTALL_DIR}**: The directory where the visualization module is installed
11138

12139
Parameter Server Variables
13140
~~~~~~~~~~~~~~~~~~~~~~~~~~
14141

15-
- **PSERVER_NT** : The number of threads used to handle incoming communications from the AD modules
16-
- **PSERVER_LOGDIR** : A directory for logging output
17-
- **VIZ_ADDRESS** : Address of the visualization module (see above).
18-
- **PROVDB_ADDR**: The address of the provenance database (see above). This option enables the storing of the final globally-aggregated function profile information into the provenance database.
19-
- **PSERVER_ALG** : Set AD algorithm to use for online analysis: "sstd" or "hbos". Default value is "hbos".
142+
- **-port ${pserver_port}** : the port used by parameter server
143+
- **-nt ${pserver_nt}** : The number of threads used to handle incoming communications from the AD modules
144+
- **-logdir ${log_dir}** : A directory for logging output
145+
- **-ad ${pserver_alg}** : Set AD algorithm to use for online analysis: "sstd" or "hbos". Default value is "hbos".
146+
- **${ps_extra_args}** : Extra arguments used by parameter server.
20147

21148
Note that all the above are optional arguments, although if the **VIZ_ADDRESS** is not provided, no information will be sent to the webserver.
22149

150+
Additional pserver Variables
151+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
152+
153+
- **-ws_addr** : Address of the visualization module.
154+
- **-provdb_addr** : The address of the provenance database (see above). This option enables the storing of the final globally-aggregated function profile information into the provenance database.
155+
- **-prov_outputpath** : This is the path to the provenance database on disk.
156+
23157
AD Variables
24158
~~~~~~~~~~~~
25159

26-
- **RANKS** : The number of MPI ranks that the application will be run on
27-
- **ADIOS2_ENGINE** : The ADIOS2 communications engine. For online analysis this should be **SST** by default (an alternative, **BP4** is discussed below)
28-
- **ADIOS2_FILE_DIR** : The directory in which the ADIOS2 file is written (see below)
29-
- **ADIOS2_FILE_PREFIX** : The ADIOS2 file prefix (see below)
30-
- **PSERVER_ADDR**: The address of the parameter server from above.
31-
- **PROVDB_ADDR**: The address of the provenance database from above.
32-
- **NSHARDS**: The number of provenance database shards
160+
- **${ADIOS2_ENGINE}** : The ADIOS2 communications engine. For online analysis this should be **SST** by default (an alternative, **BP4** is discussed below)
161+
- **${ADIOS2_PATH}** : The directory in which the ADIOS2 file is written (see below)
162+
- **${ADIOS2_FILE_PREFIX}** : The ADIOS2 file prefix.
163+
- **${EXE_NAME}** : Name of the executable of application (see examples).
164+
- **${ad_opts}** : This is a collection of all other `arguments <./appendix_usage.html#additional-ad-variables>`_ required by AD module for its instantiation.
33165

34166
Additional AD Variables
35167
~~~~~~~~~~~~~~~~~~~~~~~
@@ -40,5 +172,31 @@ Additional AD Variables
40172
- **-program_idx** : For workflows with multiple component programs, a "program index" must be supplied to the AD instances attached to those processes.
41173
- **-rank** : By default the data rank assigned to an AD instance is taken from its MPI rank in MPI_COMM_WORLD. This rank is used to verify the incoming trace data. This option allows the user to manually set the rank index.
42174
- **-override_rank** : This option disables the data rank verification and instead overwrites the data rank of the incoming trace data with the data rank stored in the AD instance. The value supplied must be the original data rank (this is used to generate the correct trace filename).
43-
- **-ad_algorithm** : This is an option which sets AD algorithm to use for online analysis: "sstd" or "hbos". Default value is "hbos".
44-
- **-hbos_threshold** : This is the threshold to control density of detected anomalies used by HBOS algorithm. Its value ranges between 0 and 1. Default value is 0.99
175+
- **-ad_algorithm** : This sets the AD algorithm to use for online analysis: "sstd" or "hbos". Default value is "hbos".
176+
- **-hbos_threshold** : This sets the threshold to control density of detected anomalies used by HBOS algorithm. Its value ranges between 0 and 1. Default value is 0.99
177+
178+
179+
Offline Analysis
180+
~~~~~~~~~~~~~~~~
181+
182+
For an offline analysis the user runs the application on its own, with Tau's ADIOS2 plugin configured to use the **BPFile** engine (**TAU_ADIOS2_ENGINE=BPFile** environment option; `see previous section <./appendix_usage.html#chimbuko-config>`_). Once complete, Tau will generate a file with a **.bp** extension and a filename chosen according to the user-specified **TAU_ADIOS2_FILENAME** environment option. The user can then copy this file to a location accessible to the Chimbuko application, for example on a local machine.
183+
184+
The first step is to run the application:
185+
186+
.. code:: bash
187+
188+
mpirun -n ${RANKS} ${APPLICATION} ${APPLICATION_ARGS}
189+
190+
Once complete, the user should locate the **.bp** file and copy to a location accessible to Chimbuko.
191+
192+
- **${RANKS}** : Number MPI ranks.
193+
- **${APPLICATION}** : Path to the application executable.
194+
- **${APPLICATION_ARGS}** : Input arguments required by the application.
195+
196+
On the analysis machine, the provenance database and parameter server should be instantiated as in the previous section. The AD modules must still be spawned under MPI with one AD instance per rank of the original job:
197+
198+
.. code:: bash
199+
200+
mpirun -n ${RANKS} driver BPFile ${ADIOS2_FILE_DIR} ${ADIOS2_FILE_PREFIX} ${OUTPUT_LOC} -pserver_addr ${PSERVER_ADDR} -provdb_addr ${PROVDB_ADDR} ...
201+
202+
Note that the first argument of **driver**, which specifies the ADIOS2 engine, has been set to **BPFile**, and the process is not run in the background.

0 commit comments

Comments
 (0)