Why Pandas?
When working with data, you’ll almost always deal with tables — datasets with rows and columns, like spreadsheets. Enter Pandas: Python’s go-to library for handling structured (tabular) data. It lets you clean, filter, reshape, group, and analyze data with minimal code.
Whether you’re loading a CSV, removing null values, or summarizing trends, Pandas is your best friend.
Getting Started with Pandas
To start using Pandas:
pythonCopyEditimport pandas as pd
You’ll mostly work with DataFrames (tables) and Series (single columns).
pythonCopyEditdata = {
'Name': ['Anay', 'Sara', 'John'],
'Age': [12, 25, 22]
}
df = pd.DataFrame(data)
print(df)
Output:
nginxCopyEdit Name Age
0 Anay 12
1 Sara 25
2 John 22
Reading and Writing Data
Pandas can handle many file formats, but CSV is the most common.
Load a CSV:
pythonCopyEditdf = pd.read_csv('students.csv')
Save to a CSV:
pythonCopyEditdf.to_csv('output.csv', index=False)
You can also load Excel, JSON, SQL, etc.
Exploring the Data
Once you load a dataset, your first job is to understand it.
pythonCopyEditdf.head() # Shows first 5 rows
df.tail() # Last 5 rows
df.shape # (rows, columns)
df.info() # Summary of data types and nulls
df.describe() # Stats for numerical columns
These quick commands give you a strong overview.
Selecting Columns and Rows
Access a single column (Series):
pythonCopyEditdf['Age'] # Using bracket notation
df.Age # Dot notation (only for simple names)
Access multiple columns:
pythonCopyEditdf[['Name', 'Age']]
Filter rows:
pythonCopyEditdf[df['Age'] > 18] # Students older than 18
Cleaning Data (Common Tasks)
Real-world data is messy. Here’s how Pandas helps:
- Missing values:
pythonCopyEditdf.isnull().sum() # Count nulls
df.dropna() # Remove rows with nulls
df.fillna(0) # Replace nulls with 0
- Renaming columns:
pythonCopyEditdf.rename(columns={'Age': 'StudentAge'}, inplace=True)
- Changing types:
pythonCopyEditdf['StudentAge'] = df['StudentAge'].astype(int)
- Dropping duplicates:
pythonCopyEditdf.drop_duplicates(inplace=True)
Sorting and Grouping
Sort rows:
pythonCopyEditdf.sort_values(by='Age', ascending=False)
Group and summarize:
pythonCopyEditdf.groupby('Age').count() # Count rows by age
df.groupby('Age').mean() # Average by age
This is how you can quickly spot trends and patterns.
Real-World Use Case Example
Imagine you’re analyzing online orders. With Pandas, you can:
- Load the CSV
- Clean missing values
- Filter for orders above ₹1,000
- Group by product category and find average spend
- Sort to find the top-performing categories
All with just a few lines of code.
Quick Summary
Task | Pandas Code Example |
---|---|
Load CSV | pd.read_csv('file.csv') |
View first rows | df.head() |
Filter rows | df[df['Age'] > 18] |
Group and aggregate | df.groupby('Age').mean() |
Handle null values | df.dropna() or df.fillna(value) |
Select column(s) | df['col'] or df[['col1', 'col2']] |
Final Thoughts
Pandas makes data exploration intuitive and fast. It’s designed to minimize friction between your brain and your data. Once you master it, tasks that once felt overwhelming — like wrangling giant datasets — will become second nature.
In the next article, we’ll focus on data visualization — turning your tables into charts that tell powerful stories.