🦟 Dengue Monitor

Dengue Monitor is a data engineering and analytics project focused on the epidemiological analysis of dengue cases in Brazil, using real-scale data (hundreds of thousands of records) and interactive dashboards.

The goal of the project is to demonstrate technical capability, professional best practices, and data-driven decision making, with a strong emphasis on performance, data engineering, and analytical visualization.

🎯 Objective

The Dengue Monitor aims to consolidate large volumes of dengue notification data, apply efficient database-level aggregations, and expose dashboards and analytical endpoints for:

Temporal analysis (epidemiological year)
Geographic analysis (state and municipality)
Demographic analysis (age range and gender)

The project was designed to:

Scale to hundreds of thousands of records
Minimize repeated database queries
Demonstrate proficiency in analytical SQL, data modeling, caching, performance optimization, and analytical APIs

🏗️ Overall Architecture

┌──────────────┐
│   CSV / API  │
└──────┬───────┘
       │
       ▼
┌──────────────────────────┐
│ Database (PostgreSQL)    │
│ - dengue_cases           │
│ - materialized views     │
└──────┬───────────────────┘
       │
       ▼
┌──────────────────────────┐
│        Data Layer        │
│ Repositories + Cache     │
└──────┬───────────────────┘
       │
       ├──────────────┐
       ▼              ▼
┌──────────────┐ ┌──────────────────────┐
│   FastAPI    │ │   Dashboard (UI)     │
│ Analytics API│ │ Streamlit + Plotly   │
└──────────────┘ └──────────────────────┘

🧱 Technology Stack

Backend / Data

Python 3.12+
SQLAlchemy
Alembic (migrations)
PostgreSQL (analytical queries and aggregations)

API

FastAPI
Pydantic (schemas)
Uvicorn

The API exposes analytical endpoints consumed by the dashboard and allows future integration with other services.

Visualization

Streamlit
Plotly

Data Engineering

Materialized Views
Optimized indexes
In-memory caching
Clear separation between Data Layer, API, and UI

🗂️ Project Structure

dengue-monitor/
│
├── dashboard/
│   ├── app.py          # Streamlit app
│   └── utils.py        # UI helpers
│
├── data/
│   ├── lookups/
│   │   ├── loader.py
│   │   ├── municipios.json
│   │   └── ufs.json
│   │
│   ├── transformers/
│   │   └── age.py
│   │
│   ├── analysis.py
│   ├── enums.py
│   ├── process_data.py
│   │
│   └── visualization/
│       ├── matplotlib.py
│       ├── plotly.py
│       └── seaborn.py
│
├── core/
│   ├── repositories/
│   │   └── dengue_repository.py
│   │
│   ├── database.py
│   └── models.py
│
├── api/
│   ├── services/
│   │   └── location_service.py
│   ├── routes.py
│   └── schemas.py
│
├── alembic/
│   └── versions/       # Migrations
│
├── main.py             # FastAPI entrypoint
├── .env.example
├── requirements.txt
└── README.md

🗄️ Database

Main Table

dengue_cases

Stores raw dengue notification records.

Key columns:

nu_ano — epidemiological year
sg_uf_not — reporting state (UF)
id_municip — municipality
idade — patient age
cs_sexo — gender (M, F, I)

Materialized Views

The project relies on materialized views to avoid expensive real-time aggregations.

Example:

CREATE MATERIALIZED VIEW dengue_by_age_gender AS
SELECT
    sg_uf_not,
    nu_ano,
    FLOOR(idade / 10) * 10 AS faixa_inicio,
    cs_sexo,
    COUNT(*) AS casos
FROM dengue_cases
WHERE idade IS NOT NULL
GROUP BY sg_uf_not, nu_ano, faixa_inicio, cs_sexo;

✔️ Fast queries ✔️ Reduced database load ✔️ Ideal for dashboards and analytical APIs

⚡ Performance & Best Practices

Streamlit data caching
Database-level aggregations
Strategic indexing
Avoids excessive database calls
Clear separation between UI, API, and Data Layer

🔁 Migrations (Alembic)

Adopted standard:

revision: random ID generated by Alembic
down_revision: explicit dependency reference
Descriptive migration file names

Example:

alembic revision -m "create materialized view dengue_by_age_gender"

This approach ensures:

Linear migration history
Schema reproducibility
Easy rollback

🚀 How to Run

1️⃣ Create virtual environment

python -m venv venv
source venv/bin/activate  # Linux/macOS
venv\Scripts\Activate.ps1 # Windows

2️⃣ Install dependencies

pip install -r requirements.txt

3️⃣ Configure environment

cp .env.example .env

Set the database environment variables.

4️⃣ Run migrations

alembic upgrade head

5️⃣ Download the dengue CSV data

The project expects raw dengue CSV files to be placed in the following directory:

data/raw/

This directory is intentionally excluded from version control.

You can obtain the CSV files using one of the following options:

🔹 Option 1 — Google Drive (recommended for quick setup)

A curated set of CSV files is available on Google Drive:

👉 https://drive.google.com/drive/folders/1GY_LRvW4pQ0isSVTN_LyWAji-ixbTdw6?usp=sharing

Steps:

Download one or more CSV files from the Drive folder
Create the directory if it does not exist:
```
data/raw/
```
Place the downloaded CSV files inside data/raw/

🔹 Option 2 — Official Brazilian Health Open Data Portal (DATASUS)

You can also download the data directly from the official source:

👉 https://dadosabertos.saude.gov.br/dataset/arboviroses-dengue

Steps:

Access the dataset page
Download the desired CSV files (by year or period)
Place the CSV files inside:
```
data/raw/
```

📌 Notes:

The ingestion pipeline supports multiple CSV files inside data/raw/
Files are processed sequentially
Only the required columns are loaded into the database
The data is normalized before insertion

6️⃣ Process and load the data

From the data/ directory:

python -m data.process_data

This script:

Reads the raw CSV
Applies sampling logic (100 records / month / UF)
Normalizes fields
Inserts data into PostgreSQL

6️⃣.1️⃣ Refresh Materialized Views (required)

After loading the data into PostgreSQL, you must refresh the materialized views used by the analytical queries and dashboards.

You can do this in one of the following ways:

🔹 Option 1 — Run refresh commands manually

Connect to your PostgreSQL database and execute:

REFRESH MATERIALIZED VIEW mv_top_municipios;
REFRESH MATERIALIZED VIEW mv_cases_heatmap_month_age;
REFRESH MATERIALIZED VIEW mv_cases_by_age_group;
REFRESH MATERIALIZED VIEW mv_cases_by_gender_age_group;

🔹 Option 2 — Run the provided SQL script (recommended)

The project includes a helper script that refreshes all materialized views at once.

From the project root, run:

psql -d <your_database_name> -f scripts/database/refresh_materialized_views.sql

📌 Notes:

This step is required every time new data is loaded
Materialized views significantly improve dashboard performance
The script is safe to run multiple times

7️⃣ Start the FastAPI server

uvicorn main:app --reload

The API exposes analytical endpoints consumed by the dashboard.

8️⃣ Start the dashboard

streamlit run dashboard/app.py

🤝 Contribution

Contributions are welcome!

Fork the project
Create a branch (feat/my-feature)
Commit following Conventional Commits
Open a Pull Request

📄 License

This project is distributed under the MIT License.

👨‍💻 Author

Project developed by Jefferson as an advanced study and technical portfolio in:

Data Engineering
Epidemiological Analysis
Analytical Visualization

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
alembic		alembic
api		api
core		core
dashboard		dashboard
data		data
scripts/database		scripts/database
visualization		visualization
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
alembic.ini		alembic.ini
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🦟 Dengue Monitor

🎯 Objective

🏗️ Overall Architecture

🧱 Technology Stack

Backend / Data

API

Visualization

Data Engineering

🗂️ Project Structure

🗄️ Database

Main Table

Materialized Views

⚡ Performance & Best Practices

🔁 Migrations (Alembic)

🚀 How to Run

1️⃣ Create virtual environment

2️⃣ Install dependencies

3️⃣ Configure environment

4️⃣ Run migrations

5️⃣ Download the dengue CSV data

🔹 Option 1 — Google Drive (recommended for quick setup)

🔹 Option 2 — Official Brazilian Health Open Data Portal (DATASUS)

6️⃣ Process and load the data

6️⃣.1️⃣ Refresh Materialized Views (required)

🔹 Option 1 — Run refresh commands manually

🔹 Option 2 — Run the provided SQL script (recommended)

7️⃣ Start the FastAPI server

8️⃣ Start the dashboard

🤝 Contribution

📄 License

👨‍💻 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages