Table of Contents
- Introduction
- Why MongoDB for Python Applications?
- What is PyMongo?
- Installing PyMongo
- Connecting to a MongoDB Database
- Creating Databases and Collections
- CRUD Operations in MongoDB using PyMongo
- Inserting Documents
- Querying Documents
- Updating Documents
- Deleting Documents
- Advanced Queries and Filtering
- Indexing in MongoDB with PyMongo
- Aggregation Pipelines
- Best Practices for Working with PyMongo
- Conclusion
Introduction
As modern applications evolve, handling unstructured or semi-structured data efficiently becomes a critical need.
MongoDB, a popular NoSQL database, offers developers the flexibility to work with dynamic schemas and massive datasets with ease.
In this module, you will learn how to integrate MongoDB with Python applications using PyMongo, a powerful and official Python driver for MongoDB.
We will cover everything from installation to complex querying and aggregation.
Why MongoDB for Python Applications?
- Schema-less Structure: Store data in JSON-like documents without a strict schema.
- High Performance: Designed for scalability and high throughput.
- Ease of Use: Insert, update, and retrieve data with simple commands.
- Scalability: Supports horizontal scaling with sharding.
- Flexible Data Models: Perfect for rapidly changing application requirements.
MongoDB is particularly useful when dealing with big data, real-time analytics, IoT applications, and flexible content management systems.
What is PyMongo?
PyMongo is the official Python driver for MongoDB.
It provides an intuitive and powerful way to interact with MongoDB databases, collections, and documents directly from Python scripts and applications.
With PyMongo, you can:
- Perform CRUD operations
- Execute complex queries and aggregations
- Create indexes
- Manage connections
- Handle transactions (with MongoDB 4.0+)
Installing PyMongo
You can install PyMongo using pip:
pip install pymongo
To verify installation:
python -c "import pymongo; print(pymongo.version)"
Connecting to a MongoDB Database
You can connect to a local or remote MongoDB server.
Here is how you create a connection:
from pymongo import MongoClient
# Connect to MongoDB server
client = MongoClient('mongodb://localhost:27017/')
# Access a specific database
db = client['mydatabase']
# Access a collection
collection = db['users']
Notes:
- Default MongoDB server runs on
localhost:27017
. - If you are connecting to a cloud database like MongoDB Atlas, replace the connection string accordingly.
Creating Databases and Collections
MongoDB is flexible. You do not have to explicitly create a database or a collection.
They are created automatically when you insert the first document.
# Accessing a database and collection
db = client['company']
employees = db['employees']
CRUD Operations in MongoDB using PyMongo
Let us dive into the core operations of any database system.
Inserting Documents
Insert a single document:
employee = {
"name": "Alice",
"department": "HR",
"salary": 50000
}
employees.insert_one(employee)
Insert multiple documents:
employee_list = [
{"name": "Bob", "department": "IT", "salary": 70000},
{"name": "Charlie", "department": "Finance", "salary": 60000}
]
employees.insert_many(employee_list)
Querying Documents
Find a single document:
result = employees.find_one({"name": "Alice"})
print(result)
Find multiple documents:
for emp in employees.find({"department": "IT"}):
print(emp)
Advanced query with conditions:
for emp in employees.find({"salary": {"$gt": 60000}}):
print(emp)
Updating Documents
Update a single document:
employees.update_one(
{"name": "Alice"},
{"$set": {"salary": 55000}}
)
Update multiple documents:
employees.update_many(
{"department": "Finance"},
{"$inc": {"salary": 5000}}
)
$set
modifies specific fields.$inc
increments numeric fields.
Deleting Documents
Delete a single document:
employees.delete_one({"name": "Charlie"})
Delete multiple documents:
employees.delete_many({"department": "HR"})
Advanced Queries and Filtering
You can use logical operators like $and
, $or
, $not
, $nor
for complex filtering:
for emp in employees.find({
"$or": [
{"department": "IT"},
{"salary": {"$gt": 60000}}
]
}):
print(emp)
You can also project specific fields:
for emp in employees.find({}, {"name": 1, "salary": 1, "_id": 0}):
print(emp)
Indexing in MongoDB with PyMongo
Indexes improve query performance.
Create an index:
employees.create_index([("name", 1)]) # 1 for ascending order
Create a unique index:
employees.create_index([("email", 1)], unique=True)
List indexes:
print(employees.index_information())
Aggregation Pipelines
Aggregation provides powerful ways to transform and analyze data.
Simple aggregation example:
pipeline = [
{"$match": {"department": "IT"}},
{"$group": {"_id": "$department", "averageSalary": {"$avg": "$salary"}}}
]
for result in employees.aggregate(pipeline):
print(result)
Stages like $match
, $group
, $sort
, $project
, $limit
, and $skip
can be combined for complex analytics.
Best Practices for Working with PyMongo
- Connection Pooling: PyMongo manages connection pooling automatically.
- Error Handling: Always use try-except blocks for database operations.
- Use ObjectIds Properly: MongoDB uses
_id
field which is anObjectId
by default. - Secure Your Database: Never expose MongoDB without authentication on public networks.
- Paginate Queries: Use
.skip()
and.limit()
to paginate large datasets.
Example of basic pagination:
page_size = 10
page_number = 2
skip = page_size * (page_number - 1)
for emp in employees.find().skip(skip).limit(page_size):
print(emp)
Conclusion
By integrating MongoDB with Python using PyMongo, developers can build fast, scalable, and flexible applications that can handle complex and dynamic data models.
In this module, you have learned how to:
- Connect to MongoDB
- Perform CRUD operations
- Use advanced queries and indexing
- Implement aggregation pipelines
- Follow best practices for database interaction