Skip to content

Commit 767fbcd

Browse files
authored
V2.1.01 (#249)
* incorrect max diffusion with resistivity (#244) Fix a bug that could result in too restrictive timesteps when resistivity is enabled fix #242 * fix documentation for reflective boundary conditions (#246) fix #228 * Per proc normalisation (#247) - show performance per sub-domain during integration - add performance measures in documentation - update link to method paper - update acknowledgements * Documentation fixes (#248) * directly ask kokkos for its execution space * remove replace source files, as this doesn't work with header files (.hpp) * add proper readme * clean up hdf5 mess in readme (is already in the full doc) * add Async malloc option to JZ configuration
1 parent 158f2aa commit 767fbcd

12 files changed

Lines changed: 131 additions & 49 deletions

File tree

CHANGELOG.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,19 @@ All notable changes to this project will be documented in this file.
44
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
55
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
66

7+
## [2.1.01] 2024-06-20
8+
### Changed
9+
- Fix a bug that could result in too restrictive timesteps when resistivity is enabled (#244)
10+
- Fix documentation for reflective boundary conditions (#246)
11+
- Changed performance metric: the performance is now measured per MPI process (and not globally) (#249)
12+
- Remove documentation for replace_idefix_source, as this can't work for .hpp file (#248)
13+
14+
### Added
15+
- Kokkos execution space configuration is now shown on startup (#248)
16+
- Add CUDA_MALLOC_ASYNC flags in Jean Zay documentation to deal with MPI issues when using Kokkos 4.3 (#248)
17+
- Add a description and link to documentation in readme (#248)
18+
- Add indicative expected performances in documentation (#249)
19+
720
## [2.1.0] 2024-05-10
821
### Changed
922
- VTK slices are automatically produced along with standard VTK when an emergency abort is triggered.

CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ set (CMAKE_CXX_STANDARD 17)
44

55
set(Idefix_VERSION_MAJOR 2)
66
set(Idefix_VERSION_MINOR 1)
7-
set(Idefix_VERSION_PATCH 00)
7+
set(Idefix_VERSION_PATCH 01)
88

99
project (idefix VERSION 2.1.00)
1010
option(Idefix_MHD "enable MHD" OFF)

README.md

Lines changed: 30 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33

44
<!-- toc -->
55

6+
- [What is Idefix?](#what-is-idefix)
7+
- [Documentation](#documentation)
68
- [Download:](#download)
79
- [Installation:](#installation)
810
- [Compile an example:](#compile-an-example)
@@ -17,6 +19,33 @@
1719

1820
<!-- tocstop -->
1921

22+
What is Idefix?
23+
---------------
24+
Idefix is a computational fluid dynamics code based on a finite-volume high-order Godunov method, originally designed for astrophysical fluid dynamics applications. Idefix is designed to be performance-portable, and uses the [Kokkos](https://github.com/kokkos/kokkos) framework to achieve this goal. This means that it can run both on your laptop's cpu and on the largest GPU Exascale clusters. More technically, Idefix can run in serial, use OpenMP and/or MPI (message passing interface) for parallelization, and use GPU acceleration when available (based on Nvidia Cuda, AMD HIP, etc...). All these capabilities are embedded within one single code, so the code relies on relatively abstracted classes and objects available in C++17, which are not necessarily
25+
familiar to astrophysicists. A large effort has been devoted to simplify this level of abstraction so that the code can be modified by researchers and students familiar with C and who are aware of basic object-oriented concepts.
26+
27+
28+
Idefix currently supports the following physics:
29+
30+
* Compressible hydrodynamics in 1D, 2D, 3D
31+
* Compressible magnetohydrodynamics using constrained transport in 1D, 2D, 3D
32+
* Multiple geometry (cartesian, polar, spherical)
33+
* Variable mesh spacing
34+
* Multiple parallelisation strategies (OpenMP, MPI, GPU offloading, etc...)
35+
* Full non-ideal MHD (Ohmic, ambipolar, Hall)
36+
* Viscosity and thermal diffusion
37+
* Super-timestepping for all parabolic terms
38+
* Orbital advection (Fargo-like)
39+
* Self-gravity
40+
* Multi dust species modelled as pressureless fluids
41+
* Multiple planets interraction
42+
43+
Documentation
44+
-------------
45+
46+
A full online documentation is available on [readTheDoc](https://idefix.readthedocs.io/latest/).
47+
48+
2049
Download:
2150
---------
2251

@@ -56,10 +85,8 @@ Configure the code launching cmake (version >= 3.16) in the example directory:
5685
cmake $IDEFIX_DIR
5786
```
5887

59-
Several options can be enabled from the command line (a complete list is available with `cmake $IDEFIX_DIR -LH`). For instance: `-DIdefix_RECONSTRUCTION=Parabolic` (enable PPM reconstruction), `-DIdefix_MPI=ON` (enable mpi), `-DKokkos_ENABLE_OPENMP=ON` (enable openmp parallelisation), etc... For more complex target architectures, it is recommended to use cmake GUI launching `ccmake $IDEFIX_DIR` in place of `cmake` and then switching on the required options.
88+
Several options can be enabled from the command line (a complete list is available with `cmake $IDEFIX_DIR -LH`). For instance: `-DIdefix_RECONSTRUCTION=Parabolic` (enable PPM reconstruction), `-DIdefix_MPI=ON` (enable mpi), `-DKokkos_ENABLE_OPENMP=ON` (enable openmp parallelisation), etc... For more complex target architectures, it is recommended to use cmake GUI launching `ccmake $IDEFIX_DIR` in place of `cmake` and then switching on the required options. See the [online documentation](https://idefix.readthedocs.io/latest/) for details.
6089

61-
Optional xdmf(hdf5+xmf) file dumping feature has been added to `Idefix`. This uses either serial or parallel implementation of `hdf5` library which needs to be made available. These xdmf file pairs can be easily visualized in `ParaView` or `VisIt` by loading the `xmf` files. The `hdf5` files can also be loaded easily in `python` (using `h5py`) for post-processing and post-run analysis. One can turn on `xdmf` data dumps by using `-DIdefix_HDF5=ON`. The `[Output]` block of `.ini` file is checked during runtime for a `xdmf` entry whih controls the frequency of xdmf file dumps during code execution.
62-
<!-- TODO: HDF5 Chunking and Compression filters -->
6390

6491
One can then compile the code:
6592

doc/source/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
author = 'Geoffroy Lesur'
2424

2525
# The full version, including alpha/beta/rc tags
26-
release = '2.1.00'
26+
release = '2.1.01'
2727

2828

2929

doc/source/index.rst

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ Terms and condition of Use
5555
===========================
5656
*Idefix* is distributed freely under the `CeCILL license <https://en.wikipedia.org/wiki/CeCILL>`_, a free software license adapted to both international and French legal matters, in the spirit of and retaining
5757
compatibility with the GNU General Public License (GPL). We expect *Idefix* to be referenced and acknowledeged by authors in their publications. At the minimum, the authors
58-
should cite the *Idefix* `method paper <https://ui.adsabs.harvard.edu/abs/2023arXiv230413746L/abstract>`_.
58+
should cite the *Idefix* `method paper <https://ui.adsabs.harvard.edu/abs/2023A%26A...677A...9L/abstract>`_.
5959

6060
*Idefix* data structure and algorithm are derived from Andrea Mignone's `PLUTO code <http://plutocode.ph.unito.it/>`_, released under the GPL license.
6161
*Idefix* also relies on the `Kokkos <https://github.com/kokkos/kokkos>`_ performance portability programming ecosystem released under the terms
@@ -74,6 +74,9 @@ Soufiane Baghdadi
7474
Gaylor Wafflard-Fernandez
7575
planet-disc interaction
7676

77+
Jonah Mauxion
78+
self-gravity module
79+
7780
Clément Robert
7881
gitlab integration, linter
7982

@@ -96,8 +99,12 @@ This documentation has automatically been generated on |today| from the followin
9699
Acknowledgements
97100
===================
98101

99-
The developement of *Idefix* is supported by the European Research Council (ERC)
100-
under the European Union Horizon 2020 research and innovation programme (Grant agreement No. 815559 (MHDiscs))
102+
The developement of *Idefix* was supported by the European Research Council (ERC)
103+
under the European Union Horizon 2020 research and innovation programme (Grant agreement No. 815559 (MHDiscs)).
104+
Idefix developement team is partly funded by the `PEPR Origins <https://pepr-origins.fr>`_ through the project "MHD@Exascale".
105+
The Idefix collaboration benefited from funding from the “Programme National de Physique Stellaire” (PNPS),
106+
“Programme National Soleil-Terre” (PNST), “Programme National de Hautes Energies” (PNHE) and
107+
“Programme National de Planétologie” (PNP) of CNRS/INSU co-funded by CEA and CNES.
101108

102109

103110
.. toctree::
@@ -108,6 +115,7 @@ under the European Union Horizon 2020 research and innovation programme (Grant a
108115
reference
109116
modules
110117
programmingguide
118+
performances
111119
kokkos
112120
contributing
113121
faq

doc/source/performances.rst

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
======================
2+
Performances
3+
======================
4+
5+
We report below the performances obtained on various architectures using Idefix. The reference test
6+
is the 3D MHD Orszag-Tang test problem with 2nd order reconstruction and uct_contact EMFS bundled in
7+
Idefix test suite, computed with a 128\ :sup:`3` resolution per MPI sub-domain on GPUs or 32\ :sup:`3`
8+
per MPI sub-domain on CPUs. All of the performances measures have been obtained enabling MPI on
9+
*one full node*, but we report here the performance *per GPU*
10+
(i.e. with 2 GCDs on AMD Mi250) or *per core* (on CPU), i.e. dividing the node performance by the number of GPU/core
11+
to simplify the comparison with other clusters.
12+
13+
The complete scalability tests are available in Idefix `method paper <https://ui.adsabs.harvard.edu/abs/2023A%26A...677A...9L/abstract>`_.
14+
The performances mentionned below are updated for each major revision of Idefix, so they might slightly differ from the method paper.
15+
16+
.. note::
17+
18+
You might expect
19+
slower performances with lower resolution when using GPUs. The overall performances also depends on
20+
the physical modules activated, the reconstruction scheme, and the efficiency of the parallel network
21+
on which you are running. The performances reported below are therefore purely indicative. We encourage
22+
you to use the embedded profiler (see :ref:`commandLine` ) when performances are smaller than expected.
23+
24+
25+
CPU performances
26+
================
27+
28+
+---------------------+--------------------+----------------------------------------------------+
29+
| Cluster name | Processor | Performances (in 10\ :sup:`6` cell/s/core) |
30+
+=====================+====================+====================================================+
31+
| TGCC/Irene Rome | AMD EPYC Rome | 0.29 |
32+
+---------------------+--------------------+----------------------------------------------------+
33+
| IDRIS/Jean Zay | Intel Cascade Lake | 0.62 |
34+
+---------------------+--------------------+----------------------------------------------------+
35+
36+
37+
GPU performances
38+
================
39+
40+
+----------------------+--------------------+----------------------------------------------------+
41+
| Cluster name | GPU | Performances (in 10\ :sup:`6` cell/s/GPU) |
42+
+======================+====================+====================================================+
43+
| IDRIS/Jean Zay | NVIDIA V100 | 110 |
44+
+----------------------+--------------------+----------------------------------------------------+
45+
| IDRIS/Jean Zay | NVIDIA A100 | 194 |
46+
+----------------------+--------------------+----------------------------------------------------+
47+
| CINES/Adastra | AMD Mi250 | 250 |
48+
+----------------------+--------------------+----------------------------------------------------+

doc/source/reference/idefix.ini.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -343,7 +343,8 @@ and ``X1-end``, ``X2-end``, ``X3-end`` for the right boundaries. Each boundary c
343343
+----------------+------------------------------------------------------------------------------------------------------------------+
344344
| periodic | Periodic boundary conditions. Each field is copied between beg and end sides of the boundary. |
345345
+----------------+------------------------------------------------------------------------------------------------------------------+
346-
| reflective | The normal component of the velocity is systematically reversed. Otherwise identical to ``outflow``. |
346+
| reflective | | Mirror the normal component of the velocity field and the tangential components of the magnetic field. |
347+
| | | Zero gradient on the other components (tangential velocity and normal field). |
347348
+----------------+------------------------------------------------------------------------------------------------------------------+
348349
| shearingbox | Shearing-box boudary conditions. |
349350
+----------------+------------------------------------------------------------------------------------------------------------------+

doc/source/reference/makefile.rst

Lines changed: 16 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -108,15 +108,17 @@ We recommend the following modules and environement variables on AdAstra:
108108

109109
.. code-block:: bash
110110
111-
module load PrgEnv-cray-amd
112-
module load cray-mpich
113-
module load craype-network-ofi
114-
module load cce
115-
module load cpe
116-
module load rocm/5.2.0
117-
export LDFLAGS="-L${ROCM_PATH}/lib -lamdhip64 -lstdc++fs"
118-
119-
The last line being there to guarantee the link to the HIP library and the access to specific
111+
module load cpe/23.12
112+
module load craype-accel-amd-gfx90a craype-x86-trento
113+
module load PrgEnv-cray
114+
module load amd-mixed/5.7.1
115+
module load rocm/5.7.1 # nécessaire a cause d'un bug de path pas encore fix..
116+
export HIPCC_COMPILE_FLAGS_APPEND="-isystem ${CRAY_MPICH_PREFIX}/include"
117+
export HIPCC_LINK_FLAGS_APPEND="-L${CRAY_MPICH_PREFIX}/lib -lmpi ${PE_MPICH_GTL_DIR_amd_gfx90a} ${PE_MPICH_GTL_LIBS_amd_gfx90a} -lstdc++fs"
118+
export CXX=hipcc
119+
export CC=hipcc
120+
121+
The `-lstdc++fs` option being there to guarantee the link to the HIP library and the access to specific
120122
C++17 <filesystem> functions.
121123

122124
Finally, *Idefix* can be configured to run on Mi250 by enabling HIP and the desired architecture with the following options to ccmake:
@@ -144,15 +146,16 @@ We recommend the following modules and environement variables on Jean Zay:
144146

145147
.. code-block:: bash
146148
147-
-DKokkos_ENABLE_CUDA=ON -DKokkos_ENABLE_VOLTA70=ON
149+
-DKokkos_ENABLE_CUDA=ON -DKokkos_ENABLE_VOLTA70=ON -DKokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC=OFF
148150
149151
While Ampere A100 GPUs are enabled with
150152

151153
.. code-block:: bash
152154
153-
-DKokkos_ENABLE_CUDA=ON -DKokkos_ENABLE_AMPERE80=ON
155+
-DKokkos_ENABLE_CUDA=ON -DKokkos_ENABLE_AMPERE80=ON -DKokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC=OFF
154156
155-
MPI (multi-GPU) can be enabled by adding ``-DIdefix_MPI=ON`` as usual.
157+
MPI (multi-GPU) can be enabled by adding ``-DIdefix_MPI=ON`` as usual. The malloc async option is here to prevent a bug when using PSM2 with async
158+
cuda malloc possibly leading to openmpi crash or hangs on the Jean Zay machine.
156159

157160
.. _setupSpecificOptions:
158161

@@ -174,7 +177,7 @@ explicitely the options as they are required, using the functions ``set_idefix_p
174177
175178
.. _customSourceFiles:
176179

177-
Add/replace custom source files
180+
Add custom source files
178181
+++++++++++++++++++++++++++++++
179182

180183
It is possible to add custom source files to be compiled and linked against *Idefix*. This can be useful
@@ -189,21 +192,6 @@ say you want to add source files for an analysis, your ``CMakeLists.txt`` should
189192
add_idefix_source(analysis.hpp)
190193
191194
192-
*Idefix* also allows one to replace a source file in `$IDEFIX_DIR` by your own implementation. This is useful when developping new functionnalities without touching
193-
the main directory of your *Idefix* repository. For instance, say one wants to replace the implementation of viscosity in `$IDEFIX_SRC/src/hydro/viscosity.cpp`,
194-
with a customised `myviscosity.cpp` in the problem directory, one should add a ``CMakeLists.txt`` in the problem directory reading
195-
196-
.. code-block::
197-
:caption: CMakeLists.txt
198-
199-
replace_idefix_source(hydro/viscosity.cpp myviscosity.cpp)
200-
201-
202-
Note that the first parameter of ``replace_idefix_source`` is used as a search pattern in `$IDEFIX_DIR`. Hence it is possible to ommit the parent directory
203-
of the file being replaced if there is only one file with that name in the *Idefix* source directory, which is not guaranteed (some classes may implement
204-
methods with the same name). It is therefore recommended to add the parent directory in the first argument of ``replace_idefix_source``.
205-
206-
207195
.. tip::
208196

209197
Don't forget to delete `CMakeCache.txt` before attempting to reconfigure the code when adding a problem-specific

src/fluid/addNonIdealMHDFlux.hpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -258,7 +258,7 @@ void Fluid<Phys>::AddNonIdealMHDFlux(const real t) {
258258
#if HAVE_ENERGY
259259
Flux(ENG,k,j,i) += - Bx1 * eta * Jx2 + Bx2 * eta * Jx1;
260260
#endif
261-
dMax(k,j,i) += eta;
261+
locdmax += eta;
262262
}
263263

264264
if(haveAmbipolar) {

src/input.cpp

Lines changed: 6 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -226,6 +226,12 @@ void Input::ShowConfig() {
226226
idfx::cout << "-----------------------------------------------------------------------------"
227227
<< std::endl;
228228

229+
std::stringstream os;
230+
Kokkos::DefaultExecutionSpace().print_configuration(os, true);
231+
idfx::cout << "Input: Kokkos configuration" << std::endl << os.str();
232+
idfx::cout << "-----------------------------------------------------------------------------"
233+
<< std::endl;
234+
229235
#ifdef SINGLE_PRECISION
230236
idfx::cout << "Input: Compiled with SINGLE PRECISION arithmetic." << std::endl;
231237
#else
@@ -237,15 +243,6 @@ void Input::ShowConfig() {
237243
#ifdef WITH_MPI
238244
idfx::cout << "Input: MPI ENABLED." << std::endl;
239245
#endif
240-
#ifdef KOKKOS_ENABLE_HIP
241-
idfx::cout << "Input: Kokkos HIP target ENABLED." << std::endl;
242-
#endif
243-
#ifdef KOKKOS_ENABLE_CUDA
244-
idfx::cout << "Input: Kokkos CUDA target ENABLED." << std::endl;
245-
#endif
246-
#ifdef KOKKOS_ENABLE_OPENMP
247-
idfx::cout << "Input: Kokkos OpenMP ENABLED." << std::endl;
248-
#endif
249246
}
250247

251248
// This routine is called whenever a specific OS signal is caught

0 commit comments

Comments
 (0)