Skip to content

Commit b7115ca

Browse files
authored
Document async and MC QC operations (#1594)
* Document async and MC QC operations * elaborate on inspecting intermediate QC files
1 parent 983e0ab commit b7115ca

2 files changed

Lines changed: 88 additions & 3 deletions

File tree

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,7 @@ For a general overview of our (O2) software, organization and processes, please
7777
* [Custom metadata](doc/Advanced.md#custom-metadata)
7878
* [Details on the data storage format in the CCDB](doc/Advanced.md#details-on-the-data-storage-format-in-the-ccdb)
7979
* [Local CCDB setup](doc/Advanced.md#local-ccdb-setup)
80+
* [Asynchronous Data and Monte Carlo QC operations](doc/Advanced.md#asynchronous-data-and-monte-carlo-qc-operations)
8081
* [QCG](doc/Advanced.md#qcg)
8182
* [Display a non-standard ROOT object in QCG](doc/Advanced.md#display-a-non-standard-root-object-in-qcg)
8283
* [Canvas options](doc/Advanced.md#canvas-options)

doc/Advanced.md

Lines changed: 87 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ Advanced topics
3030
* [Details on the data storage format in the CCDB](#details-on-the-data-storage-format-in-the-ccdb)
3131
* [Data storage format before v0.14 and ROOT 6.18](#data-storage-format-before-v014-and-root-618)
3232
* [Local CCDB setup](#local-ccdb-setup)
33+
* [Asynchronous Data and Monte Carlo QC operations](#asynchronous-data-and-monte-carlo-qc-operations)
3334
* [QCG](#qcg)
3435
* [Display a non-standard ROOT object in QCG](#display-a-non-standard-root-object-in-qcg)
3536
* [Canvas options](#canvas-options)
@@ -313,9 +314,53 @@ o2-qc --config json:/${QUALITYCONTROL_ROOT}/etc/basic.json --remote-batch result
313314
Please note, that the local batch QC workflow should not work on the same file at the same time.
314315
A semaphore mechanism is required if there is a risk they might be executed in parallel.
315316

316-
To be done:
317-
- merging multiple files into one, to allow for cases, when local batch workflows cannot access the same file.
318-
- support for Post-Processing.
317+
The file is organized into directories named after 3-letter detector codes and sub-directories representing Monitor Object Collections for specific tasks.
318+
To browse the file, one needs the associated Quality Control environment loaded, since it contains QC-specific data structures.
319+
It is worth remembering, that this file is considered as intermediate storage, thus Monitor Object do not have Checks applied and cannot be considered the final results.
320+
The quick and easy way to inspect the contents of the file is to load a recent environment (e.g. on lxplus) and open it with ROOT's `TBrowser`:
321+
```shell
322+
alienv enter O2PDPSuite/nightly-20221219-1
323+
root
324+
TBrowser t; // a browser window will pop-up
325+
```
326+
...or by browsing the file manually:
327+
```shell
328+
alienv enter O2PDPSuite/nightly-20221219-1
329+
root
330+
root [0] auto f = new TFile("QC_fullrun.root")
331+
(TFile *) @0x7ffe84833dc8
332+
root [1] f->ls()
333+
TFile** QC_fullrun.root
334+
TFile* QC_fullrun.root
335+
KEY: TDirectoryFile CPV;1 CPV
336+
KEY: TDirectoryFile EMC;1 EMC
337+
KEY: TDirectoryFile FDD;1 FDD
338+
KEY: TDirectoryFile FT0;1 FT0
339+
KEY: TDirectoryFile FV0;1 FV0
340+
KEY: TDirectoryFile GLO;1 GLO
341+
KEY: TDirectoryFile ITS;1 ITS
342+
...
343+
root [2] f->cd("GLO")
344+
(bool) true
345+
root [3] f->ls()
346+
TFile** QC_fullrun.root
347+
TFile* QC_fullrun.root
348+
TDirectoryFile* GLO GLO
349+
KEY: o2::quality_control::core::MonitorObjectCollection MTCITSTPC;1
350+
KEY: o2::quality_control::core::MonitorObjectCollection Vertexing;1
351+
KEY: TDirectoryFile CPV;1 CPV
352+
...
353+
root [4] auto vtx = dynamic_cast<o2::quality_control::core::MonitorObjectCollection*>(f->Get("GLO/Vertexing"))
354+
(o2::quality_control::core::MonitorObjectCollection *) @0x7ffe84833dc8
355+
root [5] auto vtx_x = dynamic_cast<o2::quality_control::core::MonitorObject*>(vtx->FindObject("vertex_X"))
356+
(o2::quality_control::core::MonitorObject *) @0x7ffe84833dc8
357+
root [6] vtx_x->getObject()->ClassName()
358+
(const char *) "TH1F"
359+
```
360+
To merge several incomplete QC files, one can use the `o2-qc-file-merger` executable.
361+
It takes a list of input files, which may or may not reside on alien, and produces a merged file.
362+
One can select whether the executable should fail upon any error or continue for as long as possible.
363+
Please see its `--help` output for usage details.
319364

320365
## Moving window
321366

@@ -666,6 +711,45 @@ The script `o2-qc-repo-move-objects` lets the user move an object, and thus all
666711
python3 o2-qc-repo-move-objects --url http://ccdb-test.cern.ch:8080 --path qc/TST/MO/Bob --new-path qc/TST/MO/Bob2 --log-level 10
667712
```
668713

714+
# Asynchronous Data and Monte Carlo QC operations
715+
716+
QC can accompany workflows reconstructing real and simulated data asynchronously.
717+
Usually these are distributed among thousands of nodes which might not have access to each other, thus partial results are stored and merged in form of files with mechanism explained in [Batch processing](#batch-processing).
718+
719+
QC workflows for asynchronous data reconstructions are listed in [O2DPG/Data/production/qc-workflow.sh](https://github.com/AliceO2Group/O2DPG/blob/master/DATA/production/qc-workflow.sh).
720+
The script includes paths to corresponding QC configuration files for subsystems which take part in the reconstruction.
721+
All the enabled files are merged into a combined QC workflow.
722+
Thus, it is crucial that unique keys are used in `tasks`, `checks` and `aggregators` structures, as explained in [Merging multiple configuration files into one](#merging-multiple-configuration-files-into-one).
723+
Post-processing tasks can be added in the script [O2DPG/DATA/production/o2dpg_qc_postproc_workflow.py](https://github.com/AliceO2Group/O2DPG/blob/master/DATA/production/o2dpg_qc_postproc_workflow.py).
724+
Please see the included example and the in-code documentation for further guidelines in this matter.
725+
726+
Generating and reconstructing simulated data is ran by a framework which organizes specific workflows in a directed acyclic graph and executes them in an order which satisfies all the dependencies and allocated computing resources.
727+
In contrast to data reconstruction, here, QC workflows are executed separately and pick up corresponding input files.
728+
For further details, please refer to [Adding QC Tasks to the simulation script](https://github.com/AliceO2Group/O2DPG/tree/master/MC#adding-qc-tasks-to-the-simulation-script).
729+
730+
Data and simulation productions are typically executed on Grid and EPNs, and the outcomes can be inspected in [MonALISA](http://alimonitor.cern.ch/).
731+
In both cases, QC runs alongside of each subjob and incomplete QC results are stored in files.
732+
For asynchronous data reconstruction, one file `QC.root` is created.
733+
Simulation subjobs contain a `QC` directory with separate files for each QC workflow.
734+
Relevant logs can be found in files like `stdout`, `stderr` as well as archives `debug_log_archive.zip` and `log_archive.zip`.
735+
736+
Once an expected percentage of subjobs completes, several QC merging stages are executed, each producing a merged file for certain range of subjobs.
737+
The last stage produces the complete file for given masterjob.
738+
This file is read by the `o2-qc --remote-batch` executable to run Checks on the complete objects and all the results to the QCDB.
739+
Post-Processing Tasks and associated Checks are executed right after.
740+
741+
Some runs contain too much data to be processed with one masterjob.
742+
In such case, several masterjobs are run in parallel.
743+
Each produces a `QC.root` file which contains all the statistics for a masterjob.
744+
The last masterjob to complete recognizes this fact and merges all `QC.root` into `QC_fullrun.root` and only then uploads the results to QCDB.
745+
To find it, one can use `alien_find`:
746+
```
747+
> alien_find /alice/data/2022/LHC22m/523897/apass1_epn QC_fullrun.root
748+
/alice/data/2022/LHC22m/523897/apass1_epn/0750/QC/001/QC_fullrun.root
749+
```
750+
751+
TODO explain how a connection to QCDB is made from Grid sites.
752+
669753
# QCG
670754

671755
## Display a non-standard ROOT object in QCG

0 commit comments

Comments
 (0)