This is a decision that cannot be carried out autonomously within AQUA since it will depend on several aspects of the workflow. However, we need to find a common solution to track and document provenance. While PROV is the generic standard (PROV, with available python libraries), a domain-specific ontology has been defined in METACLIP. Unfortunately METACLIP is available only in R. An implementation for xarray (not complete) was discussed here: xarray-contrib/cf-xarray#228 and a draft incomplete PR for xarray exists. No activity since Sept. 2021.
How is provenance treated in the CDS? What level of provenance granularity are we talking of? Individual operations in a diagnostic or much coarser?
Our interpretation is to provide the minimal possible solution, i.e. attaching some sort of provenance information as "Post-processed by -name- AQUA diagnostic" in the netcdf history or to any upstream provenance info.
This is a decision that cannot be carried out autonomously within AQUA since it will depend on several aspects of the workflow. However, we need to find a common solution to track and document provenance. While PROV is the generic standard (PROV, with available python libraries), a domain-specific ontology has been defined in METACLIP. Unfortunately METACLIP is available only in R. An implementation for xarray (not complete) was discussed here: xarray-contrib/cf-xarray#228 and a draft incomplete PR for xarray exists. No activity since Sept. 2021.
How is provenance treated in the CDS? What level of provenance granularity are we talking of? Individual operations in a diagnostic or much coarser?
Our interpretation is to provide the minimal possible solution, i.e. attaching some sort of provenance information as "Post-processed by -name- AQUA diagnostic" in the netcdf history or to any upstream provenance info.