Skip to content

jci02/Pandas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 

Repository files navigation

Pandas

Perfect my data manipulation and data wrangling skills


📖 What is Pandas?

Pandas is a powerful Python library used for data manipulation and analysis. It provides fast, flexible data structures that make working with structured data intuitive and efficient.

Pandas is built around two core data structures:

🔹 Series

A Series is a one‑dimensional labeled array that can hold any data type (integers, floats, strings, etc.).

Think of it as:

  • A single column in a table
  • A labeled NumPy array
  • A Python dictionary with ordered keys

Key characteristics:

  • Has an index and values
  • Homogeneous dtype
  • Vectorized operations

🔹 DataFrame

A DataFrame is a two‑dimensional labeled data structure with columns that can have different data types.

Think of it as:

  • A spreadsheet table
  • A SQL table
  • A collection of Series sharing the same index

Key characteristics:

  • Rows and columns
  • Heterogeneous data types
  • Powerful indexing and alignment

🧠 Python Prerequisites

Before working through these lessons, viewers should be comfortable with the following Python fundamentals:

✅ Core Python Basics

  • Variables and data types (int, float, str, bool)
  • Lists, tuples, dictionaries, and sets
  • Basic operators and expressions
  • Writing and calling functions
  • Lambda functions

✅ Control Flow

  • if / elif / else
  • for and while loops
  • List comprehensions

✅ Working with Python Objects

  • Indexing and slicing
  • Basic error handling
  • Understanding mutability vs immutability

✅ Helpful (but not strictly required)

  • Basic NumPy familiarity
  • File paths and working directories
  • Virtual environments

📌 Recommended level: Early intermediate Python.


📌 Overview

This repository documents my journey to mastering data manipulation using Pandas. It contains structured lessons, practice notebooks, and examples covering the most essential concepts used in real-world data analysis and data science workflows.

The goal of this repository is to build strong, practical skills in reading, transforming, analyzing, and combining datasets efficiently using Python and Pandas.


📚 Lessons Covered

1️⃣ Creating, Reading and Writing

You can’t work with data if you can’t read it.

  • Creating Series and DataFrame objects

  • Reading data from:

    • CSV files
    • Excel files
    • JSON files
  • Writing data back to files

  • Understanding dataset structure

📂 Focus: Importing and exporting data properly.


2️⃣ Selecting, Filtering & Assigning

Core skills used daily by data professionals.

  • Selecting columns and rows
  • .loc[] and .iloc[]
  • Boolean filtering
  • Conditional selection
  • Assigning new columns
  • Modifying existing data

📂 Focus: Accessing exactly the data you need.


3️⃣ Summary Functions and Maps

Extract insights from raw data.

  • Summary statistics (mean, median, describe, etc.)
  • Value counts
  • Unique values
  • map() and apply()
  • Custom functions on columns

📂 Focus: Turning data into information.


4️⃣ Grouping and Sorting

Scale up your level of insight.

  • groupby() operations
  • Aggregations
  • Multiple aggregations
  • Sorting values
  • Sorting by index
  • Ranking within groups

📂 Focus: Analyzing complex datasets efficiently.


5️⃣ Data Types and Missing Values

Handle common real-world data problems.

  • Data types (int, float, object, category)
  • Type conversion
  • Detecting missing values
  • Handling NaN
  • Filling missing data
  • Dropping missing data

📂 Focus: Cleaning and preparing data for analysis.


6️⃣ Renaming and Combining

Make sense of data from multiple sources.

  • Renaming columns and indices
  • Concatenation
  • Merging datasets
  • Joining datasets
  • Handling multi-index structures

📂 Focus: Building complete datasets from multiple pieces.


🛠 Technologies Used

  • Python 3.x
  • Pandas
  • Jupyter Notebook / VS Code

🎯 Goals of This Repository

  • Strengthen practical Pandas skills

  • Improve data cleaning techniques

  • Master data transformation workflows

  • Build a strong foundation for:

    • Data Analysis
    • Machine Learning
    • Data Science projects

🚀 Who This Repository Is For

  • Aspiring data analysts
  • Future machine learning engineers
  • Python developers working with data
  • Anyone wanting production‑ready Pandas skills

Progress: Ongoing and continuously improving.

Releases

No releases published

Packages

 
 
 

Contributors