Perfect my data manipulation and data wrangling skills
Pandas is a powerful Python library used for data manipulation and analysis. It provides fast, flexible data structures that make working with structured data intuitive and efficient.
Pandas is built around two core data structures:
A Series is a one‑dimensional labeled array that can hold any data type (integers, floats, strings, etc.).
Think of it as:
- A single column in a table
- A labeled NumPy array
- A Python dictionary with ordered keys
Key characteristics:
- Has an index and values
- Homogeneous dtype
- Vectorized operations
A DataFrame is a two‑dimensional labeled data structure with columns that can have different data types.
Think of it as:
- A spreadsheet table
- A SQL table
- A collection of Series sharing the same index
Key characteristics:
- Rows and columns
- Heterogeneous data types
- Powerful indexing and alignment
Before working through these lessons, viewers should be comfortable with the following Python fundamentals:
- Variables and data types (
int,float,str,bool) - Lists, tuples, dictionaries, and sets
- Basic operators and expressions
- Writing and calling functions
- Lambda functions
if / elif / elseforandwhileloops- List comprehensions
- Indexing and slicing
- Basic error handling
- Understanding mutability vs immutability
- Basic NumPy familiarity
- File paths and working directories
- Virtual environments
📌 Recommended level: Early intermediate Python.
This repository documents my journey to mastering data manipulation using Pandas. It contains structured lessons, practice notebooks, and examples covering the most essential concepts used in real-world data analysis and data science workflows.
The goal of this repository is to build strong, practical skills in reading, transforming, analyzing, and combining datasets efficiently using Python and Pandas.
You can’t work with data if you can’t read it.
-
Creating
SeriesandDataFrameobjects -
Reading data from:
- CSV files
- Excel files
- JSON files
-
Writing data back to files
-
Understanding dataset structure
📂 Focus: Importing and exporting data properly.
Core skills used daily by data professionals.
- Selecting columns and rows
.loc[]and.iloc[]- Boolean filtering
- Conditional selection
- Assigning new columns
- Modifying existing data
📂 Focus: Accessing exactly the data you need.
Extract insights from raw data.
- Summary statistics (
mean,median,describe, etc.) - Value counts
- Unique values
map()andapply()- Custom functions on columns
📂 Focus: Turning data into information.
Scale up your level of insight.
groupby()operations- Aggregations
- Multiple aggregations
- Sorting values
- Sorting by index
- Ranking within groups
📂 Focus: Analyzing complex datasets efficiently.
Handle common real-world data problems.
- Data types (
int,float,object,category) - Type conversion
- Detecting missing values
- Handling
NaN - Filling missing data
- Dropping missing data
📂 Focus: Cleaning and preparing data for analysis.
Make sense of data from multiple sources.
- Renaming columns and indices
- Concatenation
- Merging datasets
- Joining datasets
- Handling multi-index structures
📂 Focus: Building complete datasets from multiple pieces.
- Python 3.x
- Pandas
- Jupyter Notebook / VS Code
-
Strengthen practical Pandas skills
-
Improve data cleaning techniques
-
Master data transformation workflows
-
Build a strong foundation for:
- Data Analysis
- Machine Learning
- Data Science projects
- Aspiring data analysts
- Future machine learning engineers
- Python developers working with data
- Anyone wanting production‑ready Pandas skills
Progress: Ongoing and continuously improving.