This repository contains a SQL-based analytical case study on a fictional Rapido ride-hailing dataset, designed to demonstrate real-world data analysis skills using SQL.
The analysis focuses on:
- User behavior and engagement
- Ride distance patterns
- Vehicle-type performance
- Signup cohort analysis
- Data segmentation and aggregation
All queries are written in BigQuery SQL and emphasize clarity, correctness, and analytical intent.
Refer to Problem Statements - Queries file to access all SQL queries.
DATASETS→ Downloadable CSV, Excel files for RAPIDO DatasetSCHEMA/→ Table definitions and ER diagramProblem Statements - Queries/→ 15 business-driven SQL problem statementsRESULTS/→ Sample outputs and validation notes
The dataset follows a simple 1-to-many, star-style schema with:
- USERS as the dimension table
- RIDES as the fact table
See
SCHEMA/Rapido ER.pngfor the visual ER diagram.
**Refer DATASETS file to access and download:
Rapido DATASET - rides, usersEXCEL file for easy reference,usersandridestables, CSV files. Use these tables and create RAPIDO dataset in any SQL dialects.**
| Column Name | Description |
|---|---|
| user_id | Unique user identifier |
| first_name | User's first name |
| last_name | User's last name |
| signup_date | Date the user registered on Rapido |
| Column Name | Description |
|---|---|
| ride_id | Unique ride identifier |
| user_id | User who took the ride |
| vehicle_type | Type of vehicle used |
| start_location | Ride start location |
| end_location | Ride end location |
| distance_km | Distance travelled (in km) |
| captain_rating | Rating given to the captain (0–5) |
- Which users travel the most?
- How does ride distance vary by vehicle type?
- Which users are highly engaged vs inactive?
- How do signup cohorts behave over time?
- Which vehicle types attract diverse usage?
- Aggregations (
SUM,AVG,COUNT) - Use of
GROUP BY,CASE WHENfor conditonal buckets and grouping. - Filtering with
HAVING - CTE's
- Subqueries (correlated & non-correlated)
CASEexpressions for segmentationUNION DISTINCT- Date-based cohort filtering
- Conditional aggregation
- Analytical thinking with averages & thresholds
- Review the problem statement at the top of each SQL file.
- Examine the query logic and SQL patterns used.
- Run queries in BigQuery or MySQL adapt them for other SQL dialects.
- Refer to screenshots in
RESULTS/folder for expected outputs and row counts.
~~ Shivaling Battarki Email: shivalingb09@gmail.com