Skip to content

Add metadata and modify pubmed parser.#50

Merged
ClaireHzl merged 11 commits intomainfrom
49-add-missing-metadata
May 5, 2026
Merged

Add metadata and modify pubmed parser.#50
ClaireHzl merged 11 commits intomainfrom
49-add-missing-metadata

Conversation

@ClaireHzl
Copy link
Copy Markdown
Collaborator

@ClaireHzl ClaireHzl commented Apr 10, 2026

  • Fix PubMedParser
  • Add UnpaywallParser : the best to get pdf locations
  • Add metadata
  • Add of TU

Final json metadata :

{
  "id": doi_str,
  "found": bool,
  "sources": [list of parsers],
  "article name": str,
  "authors": [ list of dict {"name", "orcid"}],
  "journal": {"name": str, "issn": str},
  "publish date": str,
  "status": str,
  "doi": str
  "link": str,
  "document type": str,
  "document subtypes": [list of str],
  "cited by count": int,
  "open access": bool,
  "language": str ,
  "abstract": str ,
  "keywords": [list of str],
  "cited articles": [<list of DOI>]
}

@ClaireHzl ClaireHzl linked an issue Apr 10, 2026 that may be closed by this pull request
@ClaireHzl ClaireHzl marked this pull request as ready for review April 20, 2026 18:19
Copy link
Copy Markdown
Collaborator

@cgoudet cgoudet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merci pour ce travail.

Il y a des petits commentaires de style et peut être un travail pour affiner les tests. Essaie à minima de les skip avant de merger.

Comment thread eu_fact_force/ingestion/data_collection/parsers/__init__.py Outdated
Comment thread eu_fact_force/ingestion/data_collection/parsers/crossref.py Outdated
Comment thread tests/ingestion/test_metadata_collection.py Outdated
Comment thread tests/ingestion/test_metadata_collection.py Outdated
Comment thread tests/ingestion/test_metadata_collection.py
Comment thread tests/ingestion/test_metadata_collection.py Outdated
Comment thread tests/ingestion/test_metadata_collection.py
Comment thread tests/ingestion/test_metadata_collection.py Outdated
@ClaireHzl
Copy link
Copy Markdown
Collaborator Author

J'ai modifié les tests en créant des VCS (video cassette recorder) qui enregistrent les réponses des requests HTTP pour pouvoir les rejouer sans faire un vrai appel API à chaque fois. Si besoin de les refaire, on peut faire pytest --record-mode=new_episodes, mais pour la CI, ça n'appelera plus les APIs.

@ClaireHzl ClaireHzl merged commit 6b3249d into main May 5, 2026
1 check passed
@ClaireHzl ClaireHzl deleted the 49-add-missing-metadata branch May 5, 2026 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add missing metadata

2 participants