Skip to content

Add PGP ingestion pipeline (Excel -> PostgreSQL)#82

Open
Assitan-h wants to merge 2 commits intomainfrom
feature/pgp-ingestion
Open

Add PGP ingestion pipeline (Excel -> PostgreSQL)#82
Assitan-h wants to merge 2 commits intomainfrom
feature/pgp-ingestion

Conversation

@Assitan-h
Copy link
Copy Markdown
Collaborator

Cette PR ajoute le script Python d’ingestion qui charge les données Excel dans PostgreSQL (schéma raw).

Copy link
Copy Markdown
Collaborator

@cgoudet cgoudet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merci beaucoup!
Bon début.

Beaucoup de mes commentaires me laissent penser que tu n'as pas trop d'expérience en django. A dispo si tu as besoin de plus d'explications.

from datetime import date
import re

# =========================
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code mort à retirer.

RAW_SCHEMA = "raw"


# =========================
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code mort à retirer.


# =========================

if __name__ == "__main__":
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Peux tu regarder la documentation pour créer des commandes django?

print("Ingestion complete.")


# =========================
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code mort

cursor.close()
conn.close()

print("Ingestion complete.")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mettre un log plutot qu'un print.


def ingest_excel():

print("Reading Excel file...")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A priori le fichier sera un google sheet.
Donc il faudrait une étape pour soit le lire directement en pandas grâce à l'url, soit une étape pour le télécharger avant de le lire.

conn = psycopg2.connect(**DB_CONFIG)
cursor = conn.cursor()

create_schema(cursor)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Django permet de directement créer les modèles de tes tables dans models.py et d'utiliser des objets python qui représente des lignes dans les bases.

df = pd.read_excel(FILE_PATH, sheet_name=sheet)

# clean column names
df.columns = [clean_name(c) for c in df.columns]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tu peux utiliser rename à la fin de la ligne précédente.


df = pd.read_excel(FILE_PATH, sheet_name=sheet)

# clean column names
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commentaire inutile.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plutot mettre ce fichier comme fixture pour un test unitaire avec un nom plus simple.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants