A comprehensive data analysis project exploring order data from FoodHub, a food aggregator company in New York, to understand restaurant demand patterns and enhance customer experience through data-driven insights.
- Background
- Objective
- Dataset Description
- Technologies Used
- Project Structure
- Key Analysis Steps
- Key Questions to Answer
- Key Insights & Findings
- Visualizations
- How to Run
The number of restaurants in New York is increasing day by day. With hectic lifestyles becoming the norm, many rely on convenient food options:
- 👨🎓 Students with busy academic schedules
- 💼 Professionals with demanding work hours
- 👨👩👧👦 Families seeking convenient meal solutions
Online food delivery services have become a great option, providing access to good food from favorite restaurants without the hassle of cooking or dining out.
FoodHub is a food aggregator company that offers access to multiple restaurants through a single smartphone app. The platform streamlines the entire food ordering and delivery process:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Customer │───▶│ Restaurant │───▶│ Delivery │───▶│ Customer │
│ Places Order│ │ Confirms │ │ Pick-up │ │ Receives │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
- Order Placement: Customer orders food through the FoodHub app
- Restaurant Confirmation: Restaurant receives and confirms the order
- Delivery Assignment: App assigns a delivery person to pick up the order
- Pick-up: Delivery person uses the map to reach the restaurant and waits for the food
- Transit: After pick-up confirmation, delivery person travels to customer location
- Delivery: Food is delivered and drop-off is confirmed in the app
- Rating: Customer rates the order through the app
FoodHub earns revenue by collecting a fixed margin of the delivery order from partnered restaurants. Understanding order patterns and customer preferences is crucial for:
- 📈 Increasing order volume
- ⭐ Improving customer satisfaction
- 🤝 Strengthening restaurant partnerships
- 💰 Maximizing revenue
As a Data Scientist at FoodHub, the primary goals are to:
- Analyze order data to understand demand patterns across different restaurants
- Identify factors affecting customer experience and satisfaction
- Provide actionable insights to enhance business operations
- Answer key business questions posed by the Data Science team
The analysis will help FoodHub:
| Goal | Impact |
|---|---|
| Understand restaurant demand | Better resource allocation and partnerships |
| Analyze delivery efficiency | Improved delivery times and customer satisfaction |
| Study customer preferences | Enhanced personalization and recommendations |
| Identify peak periods | Optimized staffing and restaurant coordination |
The dataset contains order information from FoodHub's online portal with the following 9 features:
| Feature | Description | Data Type |
|---|---|---|
order_id |
Unique ID of the order | Integer |
customer_id |
ID of the customer who ordered the food | Integer |
restaurant_name |
Name of the restaurant | Categorical |
cuisine_type |
Cuisine ordered by the customer | Categorical |
cost_of_the_order |
Cost of the order (in dollars) | Float |
day_of_the_week |
Whether order is placed on Weekday or Weekend | Categorical |
rating |
Rating given by the customer out of 5 | Float |
food_preparation_time |
Time (in minutes) taken by restaurant to prepare food | Integer |
delivery_time |
Time (in minutes) taken to deliver the food package | Integer |
-
Food Preparation Time: Calculated as the difference between:
- Restaurant's order confirmation timestamp
- Delivery person's pick-up confirmation timestamp
-
Delivery Time: Calculated as the difference between:
- Delivery person's pick-up confirmation timestamp
- Delivery person's drop-off confirmation timestamp
| Category | Days |
|---|---|
| Weekday | Monday to Friday |
| Weekend | Saturday and Sunday |
| Category | Features | Business Relevance |
|---|---|---|
| Order Identification | order_id, customer_id | Tracking and customer analysis |
| Restaurant Info | restaurant_name, cuisine_type | Demand and preference analysis |
| Financial | cost_of_the_order | Revenue and pricing analysis |
| Temporal | day_of_the_week | Peak period identification |
| Quality | rating | Customer satisfaction measurement |
| Efficiency | food_preparation_time, delivery_time | Operational performance |
- Python 3.x
- Pandas - Data manipulation and analysis
- NumPy - Numerical computations
- Matplotlib - Basic plotting and visualization
- Seaborn - Statistical data visualization
food-hub/
├── README.md # Project documentation (this file)
├── foodhub.ipynb # Jupyter notebook with complete analysis
└── foodhub_order.csv # Dataset file
- Import necessary libraries (pandas, numpy, matplotlib, seaborn)
- Load the dataset from CSV
- Understand data structure and dimensions
- Check data types and missing values
- Generate statistical summary
- Handle missing values (especially in ratings)
- Check for duplicate orders
- Validate data consistency
- Convert data types if necessary
Analysis of individual variables:
- Cost of Order - Price distribution and spending patterns
- Food Preparation Time - How long restaurants take to prepare food
- Delivery Time - Time taken for delivery
- Rating - Customer satisfaction distribution
- Restaurant Name - Popular restaurants
- Cuisine Type - Preferred cuisines
- Day of the Week - Weekday vs Weekend order distribution
Exploring relationships between variables:
- Cuisine Type vs Cost: Which cuisines are most expensive?
- Day of Week vs Orders: Weekend vs Weekday patterns
- Restaurant vs Rating: Which restaurants have best ratings?
- Preparation Time vs Rating: Does prep time affect satisfaction?
- Delivery Time vs Rating: Impact of delivery speed on ratings
- Cost vs Rating: Relationship between price and satisfaction
- Identify top-performing restaurants
- Analyze cuisine preferences
- Understand timing patterns
- Evaluate operational efficiency
- Which restaurants receive the most orders?
- What are the most popular cuisine types?
- How do order costs vary across different restaurants?
- Which cuisines have the highest average order cost?
- What is the distribution of customer ratings?
- How many orders are placed on weekdays vs weekends?
- What is the average order cost per customer?
- Are there repeat customers, and how often do they order?
- What is the average food preparation time?
- What is the average delivery time?
- Which restaurants have the fastest/slowest preparation times?
- Is there a relationship between preparation time and ratings?
- What is the total revenue generated?
- Which restaurants contribute most to revenue?
- How does revenue differ between weekdays and weekends?
- 🏆 Top Restaurants: Identify highest-volume restaurants
- ⭐ Quality Leaders: Restaurants with best average ratings
- 💰 Revenue Drivers: Major contributors to FoodHub revenue
- 🍕 Most popular cuisine types in New York
- 💵 Price variations across different cuisines
- ⭐ Customer satisfaction by cuisine category
- 📅 Weekday vs Weekend: Order volume differences
- ⏰ Peak Periods: High-demand time identification
- 📈 Trends: Patterns in ordering behavior
- ⏱️ Preparation Efficiency: Average prep time analysis
- 🚗 Delivery Performance: Delivery time patterns
- 📊 Total Order Time: Combined prep + delivery analysis
The analysis includes various visualization types:
| Visualization Type | Purpose |
|---|---|
| Histograms | Distribution of numerical variables (cost, time, ratings) |
| Bar Charts | Restaurant and cuisine comparisons |
| Box Plots | Cost and time distribution across categories |
| Count Plots | Order frequency by day, cuisine, restaurant |
| Pie Charts | Proportion of weekday vs weekend orders |
| Scatter Plots | Relationships between numerical variables |
| Heatmaps | Correlation between variables |
| Violin Plots | Distribution comparison across groups |
pip install pandas numpy matplotlib seaborn jupyter- Clone or download the repository
- Navigate to the
food-hub/directory - Launch Jupyter Notebook:
jupyter notebook foodhub.ipynb
- Run all cells sequentially to reproduce the analysis
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Load the data
foodhub = pd.read_csv('foodhub_order.csv')
# Quick overview
print(foodhub.head())
print(foodhub.info())
print(foodhub.describe())
# View cuisine distribution
plt.figure(figsize=(12, 6))
sns.countplot(y='cuisine_type', data=foodhub, order=foodhub['cuisine_type'].value_counts().index)
plt.title('Order Distribution by Cuisine Type')
plt.xlabel('Number of Orders')
plt.ylabel('Cuisine Type')
plt.tight_layout()
plt.show()
# Analyze ratings distribution
plt.figure(figsize=(10, 6))
sns.histplot(foodhub['rating'].dropna(), bins=10, kde=True)
plt.title('Customer Rating Distribution')
plt.xlabel('Rating')
plt.ylabel('Frequency')
plt.show()By completing this case study, you will learn:
- Exploratory Data Analysis (EDA): Comprehensive exploration of business data
- Data Visualization: Creating meaningful charts for business insights
- Business Analytics: Translating data findings into actionable recommendations
- Statistical Analysis: Understanding distributions and relationships
- Python Programming: Practical application of pandas, matplotlib, and seaborn
This case study is particularly valuable because:
- Food Tech Industry: Understanding dynamics of food delivery platforms
- Operations Management: Analyzing preparation and delivery efficiency
- Customer Experience: Identifying factors affecting satisfaction
- Business Strategy: Data-driven decision making for growth
- Market Analysis: Understanding cuisine and restaurant preferences
This analysis provides FoodHub with:
- Demand Understanding: Clear picture of restaurant and cuisine preferences
- Operational Metrics: Insights into preparation and delivery efficiency
- Customer Insights: Understanding of rating patterns and satisfaction drivers
- Revenue Analysis: Identification of key revenue contributors
- Strategic Recommendations: Data-backed suggestions for business improvement
These findings will enable FoodHub to make informed decisions about restaurant partnerships, customer experience enhancements, and operational optimizations.
This project is for educational purposes.