A production-grade ETL pipeline and interactive Dash application for multi-child nutritional tracking.
dashboard_demo.mp4
An end-to-end Python application designed to track and visualize infant feeding schedules using an object-oriented architecture.
- πΌ Baby Feed Tracker: End-to-End Data Pipeline
- π― Project Objective
- βοΈ The Data Pipeline
- π Visualization & UI
- β¨ Features
- π₯ Installation
- βοΈ Configuration
- π Project Structure
- π Sample Data Format
- π Usage
- Data Validation & Error Logging
- π¨ Dashboard Overview
- πΎ Data Export
- π οΈ Requirements
- βοΈ Deployment
- π§ͺ Running Tests
- π License
- π€ Contributing
The primary goal of this project is to demonstrate a production-grade Python workflow. It serves as a blueprint for an end-to-end process taking raw data through a structured pipelineβleveraging Object-Oriented Programming (OOP)βand delivering actionable insights via an interactive web dashboard.
The core logic is divided into four distinct stages to ensure data integrity and modularity:
- Data Loading: Ingesting raw feeding logs from source files.
- Data Cleaning: Handling missing values and normalizing timestamps for consistency.
- Pydantic Validation (v2.12): Enforcing strict data schemas to ensure the pipeline remains robust and type-safe.
- Data Transformation: Processing raw logs into analytical datasets (e.g., calculating daily volumes or feeding intervals).
The DataPipeline class orchestrates this flow on a per-child basis, leveraging Pandas for high-performance transformations and Pydantic for rigorous schema enforcement.
The processed data is served through a Plotly Dash interface. By utilizing Plotly Express, the project generates interactive visualizations that allow users to:
- Monitor feeding trends over time.
- Analyze volume distributions.
- Gain quick, data-driven insights into a baby's schedule.
Application styling and interface are developed with Dash Boostrap Components.
- Create and manage feeding schedules
- Track feeding times and amounts
- Monitor baby nutrition patterns
git clone <repository-url>
uv syncgit clone <repository-url>
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install --upgrade pip
pip install -e .This project uses Pydantic Settings to manage multi-child configurations. To set up which children you want to track, create a .env file in the project root:
CHILDREN='
[{"name": "Child 1", "file_name": "file 1.xlsx", "dob": "dob 1"},
{"name": "Child 2", "file_name": "file 2.xlsx", "dob": "dob 2"}]'Each child requires the following fields in the CHILDREN JSON array:
| Field | Type | Description |
|---|---|---|
name |
string | Display name for the child (used in dashboards and exports) |
file_name |
string | Name of the feeding log file (without path; placed in data/ folder) |
dob |
date (YYYY-MM-DD) | Date of birth for age-based analytics |
Place your feeding log files in the data/ folder with the exact filename specified in the configuration. Supported formats:
- Excel:
.xlsxfiles - CSV:
.csvfiles
Ensure your feeding logs contain the required columns matching the FeedingData schema (see Sample Data Format).
Within the repo sits the following structure:
src/containsapp/β Dash app factory with Plotly charts and interactive dashboardmodels/β Pydantic data schemaspipeline/β ETL orchestration (load, clean, validate, transform)
reporting/β Excel report outputs (not tracked in Git)tests/β Unit testsmain.pyβ Execution entry point
The pipeline expects feeding logs with the following columns matching the FeedingData schema:
| Column | Type | Example |
|---|---|---|
| feed_start_time | datetime | 2025-12-24 10:30:00 |
| activity | string | Feeding |
| type | string | Bottle |
| feed_volume_ml | float | 120.5 |
| units | string | ml |
Supported values:
activity:Feedingtype:Left,Right,Bottle
Place your feeding log file in the data/live data folder (.xlsx or .csv format):
data/live data/file_name.xlsxFor each child you will need to provide a name and a date of birth.
If no live data is provided, the data pipeline default (where the use_dummy_data set to True will load random test data).
Run the data pipeline and launch the interactive dashboard
python3 -m src.mainRun with uv:
uv run python3 -m src.mainThe dashboard will be available at http://127.0.0.1:8050 by default.
The pipeline employs a "Graceful Failure" strategy for data validation using Pydantic. When records fail validation (e.g., missing fields, incorrect data types, or invalid values), they are not lost. Instead, they are captured, formatted, and stored in a dedicated error reporting DataFrame.
For every record that triggers a ValidationError, the pipeline:
- Extracts the raw record data: Keeps the original values for context.
- Counts the issues: Records the total number of validation failures for that specific row.
- Formats error details: Concatenates multiple errors into a numbered, readable list identifying exactly which field failed and why.
This allows for easy auditing and data cleaning without interrupting the processing of valid records.
If the pipeline encounters invalid data, the resulting error_df will structured as follows:
| Name | Event | Value | ... | total_errors | error_details |
|---|---|---|---|---|---|
| Baby A | Feeding | "None" | ... | 2 | 1) Value: input is not a valid float 2) Time: field required |
| Baby B | Sleep | 12.5 | ... | 1 | 1) Date: invalid date format |
The following logic in data_pipeline.py ensures that every validation hurdle is documented:
except ValidationError as e:
# Format multiple errors into a single string for the record
details = "\n".join(
f"{i}) {err['loc'][0]}: {err['msg']}"
for i, err in enumerate(e.errors(), 1)
)
# Append the original record + error metadata to the error list
error_records.append({
**record_dict,
'total_errors': e.error_count(),
'error_details': details
})
# Convert to DataFrame for export/analysis
error_df = pd.DataFrame(error_records)The error_df can the be exported via the export_data method detailed below.
The dashboard provides multiple views for analyzing feeding patterns across one or multiple children:
A high-level aggregated summary designed for parents tracking multiple children simultaneously.
- Unified Feed Stats: A single view of total volumes by day and last-feed timestamps across all profiles.
- Rolling Trends: Overlay of rolling averages to see how different infants are progressing relative to one another.
- Global Navigation: One-tap access to switch between deep-dive views for each child.
A granular look at a specific childβs nutritional journey and daily rhythms.
- Chronological Feed Stream: Comprehensive logs featuring precise timestamps, milk type (breast/bottle), and volume.
- Distribution of feed volume over time: Violin plots comparing variation in individual feed volumes.
- Anomaly Detection: Highlights significant deviations from the childβs "normal" feeding amounts.
Advanced "Sleep-Aware" metrics that distinguish between active daytime feeding and overnight maintenance.
- Night vs day feed: Visual breakdown of calories consumed during night vs day by week per child
The pipeline can export processed feeding data to Excel files for record-keeping or external analysis.
Exporting is handled by the DataPipeline.export_data() method:
pipeline.export_data(
output_file_name="child_feeding_schedule.xlsx",
export_errors=True, # Include rows that failed validation
export_validated=True # Include successfully validated records
)- Validated Data: Cleaned and transformed feeding records that passed all validations
- Error Records: Any rows from the raw data that failed validation, along with simple-to-read error messages
- Summary Sheets: Daily and weekly aggregated statistics
Exported files are saved to the reporting/ folder.
To run this project, you will need the following environment and dependencies:
- Python 3.13+: This project utilizes the recent Python features and optimizations.
- uv: It is recommended to use uv for dependency synchronization and virtual environment management.
| Dependency | Version | Purpose |
|---|---|---|
| Pydantic | >=2.12.5 |
Data validation and settings management using Python type hints. |
| Dash | >=3.3.0 |
Framework for building the analytical web dashboard. |
| Plotly | >=6.5.0 |
Interactive data visualizations. |
| Pandas | >=2.3.3 |
High-performance data manipulation and transformation. |
| Statsmodels | >=0.14.6 |
Statistical analysis tools for feeding patterns. |
| Pytest | >=9.0.2 |
Testing framework for validating the ETL pipeline logic. |
| Dash bootstrap components | >=2.0.4 |
Bootstrap components for Plotly Dash to improve styling |
This application is designed to be deployed as a Web Service on Render. To ensure the nested project structure and internal modules (like pipeline and models) load correctly, follow the configuration steps below.
When setting up your service, use these core configurations:
| Setting | Value |
|---|---|
| Runtime | Python |
| Build Command | pip install -r requirements.txt |
| Start Command | gunicorn src.main:server |
For the Start Command to work, Gunicorn needs to find the "server" variable. Ensure your src/main.py exposes the server at the top level (not indented):
# In src/main.py
app = Dash(__name__, ...)
app.layout = ...
# This must be OUTSIDE the 'if __name__ == "__main__":' block
server = app.serverRun all unit tests using:
# Using uv
uv run pytest
# Or using Python directly (if venv is activated)
python -m pytestDistributed under the MIT License. See LICENSE.txt for details.
Contributions are welcome! Please feel free to submit a Pull Request or open an issue to discuss proposed changes.