Dengue Monitor is a data engineering and analytics project focused on the epidemiological analysis of dengue cases in Brazil, using real-scale data (hundreds of thousands of records) and interactive dashboards.
The goal of the project is to demonstrate technical capability, professional best practices, and data-driven decision making, with a strong emphasis on performance, data engineering, and analytical visualization.
The Dengue Monitor aims to consolidate large volumes of dengue notification data, apply efficient database-level aggregations, and expose dashboards and analytical endpoints for:
- Temporal analysis (epidemiological year)
- Geographic analysis (state and municipality)
- Demographic analysis (age range and gender)
The project was designed to:
- Scale to hundreds of thousands of records
- Minimize repeated database queries
- Demonstrate proficiency in analytical SQL, data modeling, caching, performance optimization, and analytical APIs
ββββββββββββββββ
β CSV / API β
ββββββββ¬ββββββββ
β
βΌ
ββββββββββββββββββββββββββββ
β Database (PostgreSQL) β
β - dengue_cases β
β - materialized views β
ββββββββ¬ββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββ
β Data Layer β
β Repositories + Cache β
ββββββββ¬ββββββββββββββββββββ
β
ββββββββββββββββ
βΌ βΌ
ββββββββββββββββ ββββββββββββββββββββββββ
β FastAPI β β Dashboard (UI) β
β Analytics APIβ β Streamlit + Plotly β
ββββββββββββββββ ββββββββββββββββββββββββ
- Python 3.12+
- SQLAlchemy
- Alembic (migrations)
- PostgreSQL (analytical queries and aggregations)
- FastAPI
- Pydantic (schemas)
- Uvicorn
The API exposes analytical endpoints consumed by the dashboard and allows future integration with other services.
- Streamlit
- Plotly
- Materialized Views
- Optimized indexes
- In-memory caching
- Clear separation between Data Layer, API, and UI
dengue-monitor/
β
βββ dashboard/
β βββ app.py # Streamlit app
β βββ utils.py # UI helpers
β
βββ data/
β βββ lookups/
β β βββ loader.py
β β βββ municipios.json
β β βββ ufs.json
β β
β βββ transformers/
β β βββ age.py
β β
β βββ analysis.py
β βββ enums.py
β βββ process_data.py
β β
β βββ visualization/
β βββ matplotlib.py
β βββ plotly.py
β βββ seaborn.py
β
βββ core/
β βββ repositories/
β β βββ dengue_repository.py
β β
β βββ database.py
β βββ models.py
β
βββ api/
β βββ services/
β β βββ location_service.py
β βββ routes.py
β βββ schemas.py
β
βββ alembic/
β βββ versions/ # Migrations
β
βββ main.py # FastAPI entrypoint
βββ .env.example
βββ requirements.txt
βββ README.md
dengue_cases
Stores raw dengue notification records.
Key columns:
nu_anoβ epidemiological yearsg_uf_notβ reporting state (UF)id_municipβ municipalityidadeβ patient agecs_sexoβ gender (M, F, I)
The project relies on materialized views to avoid expensive real-time aggregations.
Example:
CREATE MATERIALIZED VIEW dengue_by_age_gender AS
SELECT
sg_uf_not,
nu_ano,
FLOOR(idade / 10) * 10 AS faixa_inicio,
cs_sexo,
COUNT(*) AS casos
FROM dengue_cases
WHERE idade IS NOT NULL
GROUP BY sg_uf_not, nu_ano, faixa_inicio, cs_sexo;βοΈ Fast queries βοΈ Reduced database load βοΈ Ideal for dashboards and analytical APIs
- Streamlit data caching
- Database-level aggregations
- Strategic indexing
- Avoids excessive database calls
- Clear separation between UI, API, and Data Layer
Adopted standard:
revision: random ID generated by Alembicdown_revision: explicit dependency reference- Descriptive migration file names
Example:
alembic revision -m "create materialized view dengue_by_age_gender"This approach ensures:
- Linear migration history
- Schema reproducibility
- Easy rollback
python -m venv venv
source venv/bin/activate # Linux/macOS
venv\Scripts\Activate.ps1 # Windowspip install -r requirements.txtcp .env.example .envSet the database environment variables.
alembic upgrade headThe project expects raw dengue CSV files to be placed in the following directory:
data/raw/
This directory is intentionally excluded from version control.
You can obtain the CSV files using one of the following options:
A curated set of CSV files is available on Google Drive:
π https://drive.google.com/drive/folders/1GY_LRvW4pQ0isSVTN_LyWAji-ixbTdw6?usp=sharing
Steps:
-
Download one or more CSV files from the Drive folder
-
Create the directory if it does not exist:
data/raw/ -
Place the downloaded CSV files inside
data/raw/
You can also download the data directly from the official source:
π https://dadosabertos.saude.gov.br/dataset/arboviroses-dengue
Steps:
-
Access the dataset page
-
Download the desired CSV files (by year or period)
-
Place the CSV files inside:
data/raw/
π Notes:
- The ingestion pipeline supports multiple CSV files inside
data/raw/ - Files are processed sequentially
- Only the required columns are loaded into the database
- The data is normalized before insertion
From the data/ directory:
python -m data.process_dataThis script:
- Reads the raw CSV
- Applies sampling logic (100 records / month / UF)
- Normalizes fields
- Inserts data into PostgreSQL
After loading the data into PostgreSQL, you must refresh the materialized views used by the analytical queries and dashboards.
You can do this in one of the following ways:
Connect to your PostgreSQL database and execute:
REFRESH MATERIALIZED VIEW mv_top_municipios;
REFRESH MATERIALIZED VIEW mv_cases_heatmap_month_age;
REFRESH MATERIALIZED VIEW mv_cases_by_age_group;
REFRESH MATERIALIZED VIEW mv_cases_by_gender_age_group;The project includes a helper script that refreshes all materialized views at once.
From the project root, run:
psql -d <your_database_name> -f scripts/database/refresh_materialized_views.sqlπ Notes:
- This step is required every time new data is loaded
- Materialized views significantly improve dashboard performance
- The script is safe to run multiple times
uvicorn main:app --reloadThe API exposes analytical endpoints consumed by the dashboard.
streamlit run dashboard/app.pyContributions are welcome!
- Fork the project
- Create a branch (
feat/my-feature) - Commit following Conventional Commits
- Open a Pull Request
This project is distributed under the MIT License.
Project developed by Jefferson as an advanced study and technical portfolio in:
- Data Engineering
- Epidemiological Analysis
- Analytical Visualization