Skip to content

my-python-projects/dengue-monitor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

47 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🦟 Dengue Monitor

Dengue Monitor is a data engineering and analytics project focused on the epidemiological analysis of dengue cases in Brazil, using real-scale data (hundreds of thousands of records) and interactive dashboards.

The goal of the project is to demonstrate technical capability, professional best practices, and data-driven decision making, with a strong emphasis on performance, data engineering, and analytical visualization.


🎯 Objective

The Dengue Monitor aims to consolidate large volumes of dengue notification data, apply efficient database-level aggregations, and expose dashboards and analytical endpoints for:

  • Temporal analysis (epidemiological year)
  • Geographic analysis (state and municipality)
  • Demographic analysis (age range and gender)

The project was designed to:

  • Scale to hundreds of thousands of records
  • Minimize repeated database queries
  • Demonstrate proficiency in analytical SQL, data modeling, caching, performance optimization, and analytical APIs

πŸ—οΈ Overall Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   CSV / API  β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Database (PostgreSQL)    β”‚
β”‚ - dengue_cases           β”‚
β”‚ - materialized views     β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚        Data Layer        β”‚
β”‚ Repositories + Cache     β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β–Ό              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   FastAPI    β”‚ β”‚   Dashboard (UI)     β”‚
β”‚ Analytics APIβ”‚ β”‚ Streamlit + Plotly   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

🧱 Technology Stack

Backend / Data

  • Python 3.12+
  • SQLAlchemy
  • Alembic (migrations)
  • PostgreSQL (analytical queries and aggregations)

API

  • FastAPI
  • Pydantic (schemas)
  • Uvicorn

The API exposes analytical endpoints consumed by the dashboard and allows future integration with other services.

Visualization

  • Streamlit
  • Plotly

Data Engineering

  • Materialized Views
  • Optimized indexes
  • In-memory caching
  • Clear separation between Data Layer, API, and UI

πŸ—‚οΈ Project Structure

dengue-monitor/
β”‚
β”œβ”€β”€ dashboard/
β”‚   β”œβ”€β”€ app.py          # Streamlit app
β”‚   └── utils.py        # UI helpers
β”‚
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ lookups/
β”‚   β”‚   β”œβ”€β”€ loader.py
β”‚   β”‚   β”œβ”€β”€ municipios.json
β”‚   β”‚   └── ufs.json
β”‚   β”‚
β”‚   β”œβ”€β”€ transformers/
β”‚   β”‚   └── age.py
β”‚   β”‚
β”‚   β”œβ”€β”€ analysis.py
β”‚   β”œβ”€β”€ enums.py
β”‚   β”œβ”€β”€ process_data.py
β”‚   β”‚
β”‚   └── visualization/
β”‚       β”œβ”€β”€ matplotlib.py
β”‚       β”œβ”€β”€ plotly.py
β”‚       └── seaborn.py
β”‚
β”œβ”€β”€ core/
β”‚   β”œβ”€β”€ repositories/
β”‚   β”‚   └── dengue_repository.py
β”‚   β”‚
β”‚   β”œβ”€β”€ database.py
β”‚   └── models.py
β”‚
β”œβ”€β”€ api/
β”‚   β”œβ”€β”€ services/
β”‚   β”‚   └── location_service.py
β”‚   β”œβ”€β”€ routes.py
β”‚   └── schemas.py
β”‚
β”œβ”€β”€ alembic/
β”‚   └── versions/       # Migrations
β”‚
β”œβ”€β”€ main.py             # FastAPI entrypoint
β”œβ”€β”€ .env.example
β”œβ”€β”€ requirements.txt
└── README.md

πŸ—„οΈ Database

Main Table

dengue_cases

Stores raw dengue notification records.

Key columns:

  • nu_ano β€” epidemiological year
  • sg_uf_not β€” reporting state (UF)
  • id_municip β€” municipality
  • idade β€” patient age
  • cs_sexo β€” gender (M, F, I)

Materialized Views

The project relies on materialized views to avoid expensive real-time aggregations.

Example:

CREATE MATERIALIZED VIEW dengue_by_age_gender AS
SELECT
    sg_uf_not,
    nu_ano,
    FLOOR(idade / 10) * 10 AS faixa_inicio,
    cs_sexo,
    COUNT(*) AS casos
FROM dengue_cases
WHERE idade IS NOT NULL
GROUP BY sg_uf_not, nu_ano, faixa_inicio, cs_sexo;

βœ”οΈ Fast queries βœ”οΈ Reduced database load βœ”οΈ Ideal for dashboards and analytical APIs


⚑ Performance & Best Practices

  • Streamlit data caching
  • Database-level aggregations
  • Strategic indexing
  • Avoids excessive database calls
  • Clear separation between UI, API, and Data Layer

πŸ” Migrations (Alembic)

Adopted standard:

  • revision: random ID generated by Alembic
  • down_revision: explicit dependency reference
  • Descriptive migration file names

Example:

alembic revision -m "create materialized view dengue_by_age_gender"

This approach ensures:

  • Linear migration history
  • Schema reproducibility
  • Easy rollback

πŸš€ How to Run

1️⃣ Create virtual environment

python -m venv venv
source venv/bin/activate  # Linux/macOS
venv\Scripts\Activate.ps1 # Windows

2️⃣ Install dependencies

pip install -r requirements.txt

3️⃣ Configure environment

cp .env.example .env

Set the database environment variables.

4️⃣ Run migrations

alembic upgrade head

5️⃣ Download the dengue CSV data

The project expects raw dengue CSV files to be placed in the following directory:

data/raw/

This directory is intentionally excluded from version control.

You can obtain the CSV files using one of the following options:


πŸ”Ή Option 1 β€” Google Drive (recommended for quick setup)

A curated set of CSV files is available on Google Drive:

πŸ‘‰ https://drive.google.com/drive/folders/1GY_LRvW4pQ0isSVTN_LyWAji-ixbTdw6?usp=sharing

Steps:

  1. Download one or more CSV files from the Drive folder

  2. Create the directory if it does not exist:

    data/raw/
    
  3. Place the downloaded CSV files inside data/raw/


πŸ”Ή Option 2 β€” Official Brazilian Health Open Data Portal (DATASUS)

You can also download the data directly from the official source:

πŸ‘‰ https://dadosabertos.saude.gov.br/dataset/arboviroses-dengue

Steps:

  1. Access the dataset page

  2. Download the desired CSV files (by year or period)

  3. Place the CSV files inside:

    data/raw/
    

πŸ“Œ Notes:

  • The ingestion pipeline supports multiple CSV files inside data/raw/
  • Files are processed sequentially
  • Only the required columns are loaded into the database
  • The data is normalized before insertion

6️⃣ Process and load the data

From the data/ directory:

python -m data.process_data

This script:

  • Reads the raw CSV
  • Applies sampling logic (100 records / month / UF)
  • Normalizes fields
  • Inserts data into PostgreSQL

6️⃣.1️⃣ Refresh Materialized Views (required)

After loading the data into PostgreSQL, you must refresh the materialized views used by the analytical queries and dashboards.

You can do this in one of the following ways:


πŸ”Ή Option 1 β€” Run refresh commands manually

Connect to your PostgreSQL database and execute:

REFRESH MATERIALIZED VIEW mv_top_municipios;
REFRESH MATERIALIZED VIEW mv_cases_heatmap_month_age;
REFRESH MATERIALIZED VIEW mv_cases_by_age_group;
REFRESH MATERIALIZED VIEW mv_cases_by_gender_age_group;

πŸ”Ή Option 2 β€” Run the provided SQL script (recommended)

The project includes a helper script that refreshes all materialized views at once.

From the project root, run:

psql -d <your_database_name> -f scripts/database/refresh_materialized_views.sql

πŸ“Œ Notes:

  • This step is required every time new data is loaded
  • Materialized views significantly improve dashboard performance
  • The script is safe to run multiple times

7️⃣ Start the FastAPI server

uvicorn main:app --reload

The API exposes analytical endpoints consumed by the dashboard.

8️⃣ Start the dashboard

streamlit run dashboard/app.py

🀝 Contribution

Contributions are welcome!

  1. Fork the project
  2. Create a branch (feat/my-feature)
  3. Commit following Conventional Commits
  4. Open a Pull Request

πŸ“„ License

This project is distributed under the MIT License.


πŸ‘¨β€πŸ’» Author

Project developed by Jefferson as an advanced study and technical portfolio in:

  • Data Engineering
  • Epidemiological Analysis
  • Analytical Visualization

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors