📊 Data Validation & Reconciliation Tool

A Python-based application designed to automate data validation between source and target datasets during data migration, ETL validation, or system reconciliation processes.

The tool allows analysts and data engineers to quickly detect data inconsistencies, missing records, and column mismatches while providing clear insights about overall data migration health.

Overview

Data migrations and ETL pipelines require validating that the target system accurately reflects the source dataset.

Manual spreadsheet comparisons become inefficient and unreliable when working with large datasets or complex schemas.

This application automates the validation workflow and produces clear validation summaries that highlight discrepancies between datasets.

Key Features

Flexible Primary Key Selection

Users can manually select the primary key used to align records between the source and target datasets.

This flexibility allows validation across different table structures and migration scenarios.

Duplicate Key Detection

The tool automatically checks the selected comparison key for duplicate values in both datasets.

If duplicates are detected, validation stops and an error message is displayed to prevent incorrect comparisons.

Users can then select another column as the comparison key.

Dataset Alignment

The validator aligns rows between datasets using the selected key before performing comparisons.

It also detects records that exist only in one dataset, helping identify potential data loss or unexpected records.

Column-Level Validation

Each column shared between the datasets is compared to detect value differences including:

value mismatches
null inconsistencies
missing values
unexpected value changes

Comparison Modes

The tool provides two comparison modes:

Normalized Mode – removes formatting differences such as case sensitivity, trailing spaces, and numeric formatting
Strict Mode – compares values exactly as stored in the datasets

Migration Health Insights

After validation, the application produces a summary showing the overall data migration health.

The dashboard includes:

rows compared
columns compared
mismatched values
rows containing discrepancies
attribute accuracy scores

Mismatch Classification

Detected issues are automatically categorized to help understand the root cause of discrepancies.

Examples include:

perfect matches
missing values on source
missing values on target
mostly incorrect values
mixed mismatch patterns

Issue Sampling for Tracking

For each column containing mismatches, the tool generates up to five example records showing the detected issue.

These samples are formatted as text so they can easily be copied into issue trackers, validation logs, or data quality reports.

Quick Launch (Recommended)

The repository includes a Windows launcher allowing the application to run without installing Python manually.

The launcher uses a bundled portable Python environment and installs required dependencies automatically when needed.

To run the application:

Download or clone the repository
Open the project folder
Double-click 🚀 Start Data Validation Tool.bat

The Streamlit dashboard will automatically open in your browser.

Developer Setup

If you prefer running the application manually:

Install dependencies:


pip install -r requirements.txt

Run the application:


streamlit run app.py

Tech Stack

Python
Pandas
Streamlit

Project Structure


data-validator-tool
│
├── launch_validator.bat
├── python_portable/
│
├── app.py
│
├── src/
│   ├── validator.py
│   ├── comparison.py
│   └── profiling.py
│
├── sample_data/
│
├── screenshots/
│
├── requirements.txt
└── README.md

Example Workflow

Upload the source dataset
Upload the target dataset
Select the comparison key
Choose the comparison mode
Run the validation
Review mismatch insights and migration metrics

Example Output

dataset comparison metrics
column accuracy scores
migration health summary
mismatch samples for issue tracking
exportable mismatch reports

Screenshots of the interface can be added in this section.

Application Preview

Future Improvements

database connection support
automated validation report exports
large dataset optimization
scheduled validation workflows
support for additional file formats

Download Portable Version

A portable version of the application is available in the repository releases.

Download the ZIP package and run the launcher to start the tool without installing Python.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📊 Data Validation & Reconciliation Tool

Overview

Key Features

Flexible Primary Key Selection

Duplicate Key Detection

Dataset Alignment

Column-Level Validation

Comparison Modes

Migration Health Insights

Mismatch Classification

Issue Sampling for Tracking

Quick Launch (Recommended)

Developer Setup

Tech Stack

Project Structure

Example Workflow

Example Output

Application Preview

Future Improvements

Download Portable Version

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
sample_data		sample_data
screenshots		screenshots
src		src
README.md		README.md
app.py		app.py
launch_validator.bat		launch_validator.bat
requirements.txt		requirements.txt
📊 Data Validation Tool – User Guide.pdf		📊 Data Validation Tool – User Guide.pdf

Folders and files

Latest commit

History

Repository files navigation

📊 Data Validation & Reconciliation Tool

Overview

Key Features

Flexible Primary Key Selection

Duplicate Key Detection

Dataset Alignment

Column-Level Validation

Comparison Modes

Migration Health Insights

Mismatch Classification

Issue Sampling for Tracking

Quick Launch (Recommended)

Developer Setup

Tech Stack

Project Structure

Example Workflow

Example Output

Application Preview

Future Improvements

Download Portable Version

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages