Skip to content

Commit 7296948

Browse files
franzpoeschelax3l
andauthored
Add documentation for typical use cases of openpmd-pipe (#1578)
* Add documentation for use cases of openpmd-pipe * Update docs/source/analysis/pipe.rst * Move this documentation to cli.rst * Revert "Update docs/source/analysis/pipe.rst" This reverts commit 993b225. * Revert "Add documentation for use cases of openpmd-pipe" This reverts commit e3e4336. * Headers --> paragraphs --------- Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja>
1 parent e668e86 commit 7296948

1 file changed

Lines changed: 134 additions & 13 deletions

File tree

docs/source/utilities/cli.rst

Lines changed: 134 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -28,24 +28,145 @@ With some ``pip``-based python installations, you might have to run this as a mo
2828

2929
Redirect openPMD data from any source to any sink.
3030

31-
The script can be used in parallel via MPI.
32-
Datasets will be split into chunks of equal size to be loaded and written by the single processes.
31+
Any Python-enabled openPMD-api installation with enabled CLI tools comes with a command-line tool named ``openpmd-pipe``.
32+
Naming and use are inspired from the `piping concept <https://en.wikipedia.org/wiki/Pipeline_(Unix)>`__ known from UNIX shells.
3333

34-
Possible uses include:
34+
With some ``pip``-based python installations, you might have to run this as a module:
3535

36-
* Conversion of a dataset between two openPMD-based backends, such as ADIOS and HDF5.
37-
* Decompression and compression of a dataset.
38-
* Capture of a stream into a file.
39-
* Template for simpler loosely-coupled post-processing scripts.
36+
.. code-block:: bash
4037
41-
The syntax of the command line tool is printed via:
38+
python3 -m openpmd_api.pipe --help
4239
43-
.. code-block:: bash
40+
The fundamental idea is to redirect data from an openPMD data source to another openPMD data sink.
41+
This concept becomes useful through the openPMD-api's ability to use different backends in different configurations; ``openpmd-pipe`` can hence be understood as a translation from one I/O configuration to another one.
4442

45-
openpmd-pipe --help
4643

47-
With some ``pip``-based python installations, you might have to run this as a module:
44+
.. note::
4845

49-
.. code-block:: bash
46+
``openpmd-pipe`` is (currently) optimized for streaming workflows in order to minimize the number of back-and-forth communications between writer and reader.
47+
All data load operations are issued in a single ``flush()`` per iteration.
48+
Data is loaded directly loaded into backend-provided buffers of the writer (if supported by the writer), where again only one ``flush()`` per iteration is used to put data to disk again.
49+
This means that the peak memory usage will be roughly equivalent to the data size of each single iteration.
5050

51-
python3 -m openpmd_api.pipe --help
51+
The reader Series is configured by the parameters ``--infile`` and ``--inconfig`` which are both forwarded to the ``filepath`` and ``options`` parameters of the ``Series`` constructor.
52+
The writer Series is likewise controlled by ``--outfile`` and ``--outconfig``.
53+
54+
Use of MPI is controlled by the ``--mpi`` and ``--no-mpi`` switches.
55+
If left unspecified, MPI will be used automatically if the MPI size is greater than 1.
56+
57+
.. note::
58+
59+
Required parameters are ``--infile`` and ``--outfile``. Otherwise also refer to the output of ``--openpmd-pipe --help``.
60+
61+
When using MPI, each dataset will be sliced into roughly equally-sized hyperslabs along the dimension with highest item count for load distribution across worker ranks.
62+
63+
If you are interested in further chunk distribution strategies (e.g. node-aware distribution, chunking-aware distribution) that are used/tested on development branches, feel free to contact us, e.g. on GitHub.
64+
65+
The remainder of this page discusses a select number of use cases and examples for the ``openpmd-pipe`` tool.
66+
67+
68+
Conversion between backends
69+
^^^^^^^^^^^^^^^^^^^^^^^^^^^
70+
71+
Converting from ADIOS2 to HDF5:
72+
73+
.. code:: bash
74+
75+
$ openpmd-pipe --infile simData_%T.bp --outfile simData_%T.h5
76+
77+
Converting from the ADIOS2 BP3 engine to the (newer) ADIOS2 BP5 engine:
78+
79+
.. code:: bash
80+
81+
$ openpmd-pipe --infile simData_%T.bp --outfile simData_%T.bp5
82+
83+
# or e.g. via inline TOML specification (also possible: JSON)
84+
$ openpmd-pipe --infile simData_%T.bp --outfile output_folder/simData_%T.bp \
85+
--outconfig 'adios2.engine.type = "bp5"'
86+
# the config can also be read from a file, e.g. --outconfig @cfg.toml
87+
# or --outconfig @cfg.json
88+
89+
Converting between iteration encodings
90+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
91+
92+
Converting to group-based iteration encoding:
93+
94+
.. code:: bash
95+
96+
$ openpmd-pipe --infile simData_%T.h5 --outfile simData.h5
97+
98+
Converting to variable-based iteration encoding (not yet feature-complete):
99+
100+
.. code:: bash
101+
102+
# e.g. specified via inline JSON
103+
$ openpmd-pipe --infile simData_%T.bp --outfile simData.bp \
104+
--outconfig '{"iteration_encoding": "variable_based"}'
105+
106+
107+
Capturing a stream
108+
^^^^^^^^^^^^^^^^^^
109+
110+
Since the openPMD-api also supports streaming/staging I/O transports from ADIOS2, ``openpmd-pipe`` can be used to capture a stream in order to write it to disk.
111+
In the ADIOS2 `SST engine <https://adios2.readthedocs.io/en/latest/engines/engines.html#sst-sustainable-staging-transport>`_, a stream can have any number of readers.
112+
This makes it possible to intercept a stream in a data processing pipeline.
113+
114+
.. code:: bash
115+
116+
$ cat << EOF > streamParams.toml
117+
[adios2.engine.parameters]
118+
DataTransport = "fabric"
119+
OpenTimeoutSecs = 600
120+
EOF
121+
122+
$ openpmd-pipe --infile streamContactFile.sst --inconfig @streamParams.toml \
123+
--outfile capturedStreamData_%06T.bp
124+
125+
# Just loading and discarding streaming data, e.g. for performance benchmarking:
126+
$ openpmd-pipe --infile streamContactFile.sst --inconfig @streamParams.toml \
127+
--outfile null.bp --outconfig 'adios2.engine.type = "nullcore"'
128+
129+
130+
Defragmenting a file
131+
^^^^^^^^^^^^^^^^^^^^
132+
133+
Due to the file layout of ADIOS2, especially mesh-refinement-enabled simulation codes can create file output that is very strongly fragmented.
134+
Since only one ``load_chunk()`` and one ``store_chunk()`` call is issued per MPI rank, per dataset and per iteration, the file is implicitly defragmented by the backend when passed through ``openpmd-pipe``:
135+
136+
.. code:: bash
137+
138+
$ openpmd-pipe --infile strongly_fragmented_%T.bp --outfile defragmented_%T.bp
139+
140+
Post-hoc compression
141+
^^^^^^^^^^^^^^^^^^^^
142+
143+
The openPMD-api can be directly used to compress data already when originally creating it.
144+
When however intending to compress data that has been written without compression enabled, ``openpmd-pipe`` can help:
145+
146+
.. code:: bash
147+
148+
$ cat << EOF > compression_cfg.json
149+
{
150+
"adios2": {
151+
"dataset": {
152+
"operators": [
153+
{
154+
"type": "blosc",
155+
"parameters": {
156+
"clevel": 1,
157+
"doshuffle": "BLOSC_BITSHUFFLE"
158+
}
159+
}
160+
]
161+
}
162+
}
163+
}
164+
EOF
165+
166+
$ openpmd-pipe --infile not_compressed_%T.bp --outfile compressed_%T.bp \
167+
--outconfig @compression_cfg.json
168+
169+
Starting point for custom transformation and analysis
170+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
171+
172+
``openpmd-pipe`` is a Python script that can serve as basis for custom extensions, e.g. for adding, modifying, transforming or reducing data. The typical use case would be as a building block in a domain-specific data processing pipeline.

0 commit comments

Comments
 (0)