Introduction to Pandas and Working with Tabular Data

Why Pandas?

When working with data, you’ll almost always deal with tables — datasets with rows and columns, like spreadsheets. Enter Pandas: Python’s go-to library for handling structured (tabular) data. It lets you clean, filter, reshape, group, and analyze data with minimal code.

Whether you’re loading a CSV, removing null values, or summarizing trends, Pandas is your best friend.


Getting Started with Pandas

To start using Pandas:

pythonCopyEditimport pandas as pd

You’ll mostly work with DataFrames (tables) and Series (single columns).

pythonCopyEditdata = {
    'Name': ['Anay', 'Sara', 'John'],
    'Age': [12, 25, 22]
}

df = pd.DataFrame(data)
print(df)

Output:

nginxCopyEdit   Name  Age
0  Anay   12
1  Sara   25
2  John   22

Reading and Writing Data

Pandas can handle many file formats, but CSV is the most common.

Load a CSV:

pythonCopyEditdf = pd.read_csv('students.csv')

Save to a CSV:

pythonCopyEditdf.to_csv('output.csv', index=False)

You can also load Excel, JSON, SQL, etc.


Exploring the Data

Once you load a dataset, your first job is to understand it.

pythonCopyEditdf.head()        # Shows first 5 rows  
df.tail()        # Last 5 rows  
df.shape         # (rows, columns)  
df.info()        # Summary of data types and nulls  
df.describe()    # Stats for numerical columns

These quick commands give you a strong overview.


Selecting Columns and Rows

Access a single column (Series):

pythonCopyEditdf['Age']            # Using bracket notation
df.Age               # Dot notation (only for simple names)

Access multiple columns:

pythonCopyEditdf[['Name', 'Age']]

Filter rows:

pythonCopyEditdf[df['Age'] > 18]   # Students older than 18

Cleaning Data (Common Tasks)

Real-world data is messy. Here’s how Pandas helps:

  • Missing values:
pythonCopyEditdf.isnull().sum()            # Count nulls
df.dropna()                  # Remove rows with nulls
df.fillna(0)                 # Replace nulls with 0
  • Renaming columns:
pythonCopyEditdf.rename(columns={'Age': 'StudentAge'}, inplace=True)
  • Changing types:
pythonCopyEditdf['StudentAge'] = df['StudentAge'].astype(int)
  • Dropping duplicates:
pythonCopyEditdf.drop_duplicates(inplace=True)

Sorting and Grouping

Sort rows:

pythonCopyEditdf.sort_values(by='Age', ascending=False)

Group and summarize:

pythonCopyEditdf.groupby('Age').count()         # Count rows by age
df.groupby('Age').mean()          # Average by age

This is how you can quickly spot trends and patterns.


Real-World Use Case Example

Imagine you’re analyzing online orders. With Pandas, you can:

  • Load the CSV
  • Clean missing values
  • Filter for orders above ₹1,000
  • Group by product category and find average spend
  • Sort to find the top-performing categories

All with just a few lines of code.


Quick Summary

TaskPandas Code Example
Load CSVpd.read_csv('file.csv')
View first rowsdf.head()
Filter rowsdf[df['Age'] > 18]
Group and aggregatedf.groupby('Age').mean()
Handle null valuesdf.dropna() or df.fillna(value)
Select column(s)df['col'] or df[['col1', 'col2']]

Final Thoughts

Pandas makes data exploration intuitive and fast. It’s designed to minimize friction between your brain and your data. Once you master it, tasks that once felt overwhelming — like wrangling giant datasets — will become second nature.

In the next article, we’ll focus on data visualization — turning your tables into charts that tell powerful stories.


Next Up: Data Visualization with Matplotlib and Seaborn