Python Basics for Data Science

Why Python?

Python has become the de facto language of data science, and for good reason. It’s beginner-friendly, highly readable, and backed by a vast ecosystem of libraries designed specifically for working with data — like pandas, numpy, matplotlib, and scikit-learn.

Whether you’re manipulating data, building models, or visualizing trends, Python makes it easier to move from idea to execution with minimal friction.


Setting Up Your Environment

Before diving into code, you’ll need a basic setup:

  • Python (version 3.7 or higher) – Install via python.org
  • Jupyter Notebook – Interactive notebooks are perfect for experimentation.
  • IDE (Optional) – VS Code or PyCharm can make coding easier.
  • Package Manager – Use pip or conda to install libraries.
Pro tip: Tools like Anaconda bundle everything you need for data science — ideal for beginners.

Essential Python Libraries for Data Science

You’ll work with these constantly:

  • numpy: For numerical operations and arrays
  • pandas: For data manipulation and analysis
  • matplotlib / seaborn: For data visualization
  • scikit-learn: For building ML models
  • statsmodels: For statistical modeling

We’ll explore each in later articles — for now, let’s focus on core Python skills.


Core Python Concepts Every Data Scientist Should Know

Let’s walk through the basics — the building blocks you’ll use across all data tasks.

1. Variables & Data Types

pythonCopyEditage = 25              # Integer
height = 5.9          # Float
name = "Anay"         # String
is_student = True     # Boolean

2. Data Structures

Python has powerful built-in structures to store and organize data:

  • Lists – Ordered, mutable collections
    marks = [88, 92, 79]
  • Tuples – Ordered, immutable collections
    point = (4, 5)
  • Dictionaries – Key-value pairs
    student = {"name": "Anay", "age": 12}
  • Sets – Unordered, unique elements
    unique_scores = {88, 92, 79}

3. Control Flow

Used to control what happens next based on conditions.

pythonCopyEditif age > 18:
    print("Adult")
else:
    print("Minor")

Loops help automate repetition:

pythonCopyEditfor score in marks:
    print(score)

Functions

Functions help organize code into reusable blocks.

pythonCopyEditdef square(x):
    return x * x

print(square(5))  # Output: 25

You’ll often write custom functions to clean data, perform calculations, or build features.


Working with Files

Data scientists work with CSVs, JSON, and other files daily.

pythonCopyEditwith open('data.txt', 'r') as file:
    content = file.read()
    print(content)

You’ll later use libraries like pandas to load and manipulate CSVs efficiently.


Example: Your First Data Operation with Pandas

pythonCopyEditimport pandas as pd

data = {
    "Name": ["Anay", "Sara", "John"],
    "Age": [12, 25, 22]
}

df = pd.DataFrame(data)
print(df)

Output:

nginxCopyEdit   Name  Age
0  Anay   12
1  Sara   25
2  John   22

This is just a glimpse of how effortless Python makes it to turn raw data into structured form.


Final Thoughts

Python is not just a programming language — it’s a tool for thinking about problems and building solutions quickly. The basics you’ve seen here will become second nature with practice.

In the next few articles, we’ll dive deeper into working with data using pandas, data cleaning, and visualization — all using Python.


Next Up: Introduction to Pandas and Working with Tabular Data