Skip to content

Data provenance #10

@oloapinivad

Description

@oloapinivad

This is a decision that cannot be carried out autonomously within AQUA since it will depend on several aspects of the workflow. However, we need to find a common solution to track and document provenance. While PROV is the generic standard (PROV, with available python libraries), a domain-specific ontology has been defined in METACLIP. Unfortunately METACLIP is available only in R. An implementation for xarray (not complete) was discussed here: xarray-contrib/cf-xarray#228 and a draft incomplete PR for xarray exists. No activity since Sept. 2021.
How is provenance treated in the CDS? What level of provenance granularity are we talking of? Individual operations in a diagnostic or much coarser?
Our interpretation is to provide the minimal possible solution, i.e. attaching some sort of provenance information as "Post-processed by -name- AQUA diagnostic" in the netcdf history or to any upstream provenance info.

Metadata

Metadata

Assignees

No one assigned

    Labels

    discussionDiscuss future upgrade

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions