Table of Contents
- Introduction
- Why SQL Performance Tuning Matters
- What Is EXPLAIN PLAN?
- Basic Usage of EXPLAIN PLAN (PostgreSQL, MySQL, Oracle)
- Understanding Query Execution Stages
- Interpreting Rows, Cost, Width, Time
- Common Operators in Query Plans
- Seq Scan vs Index Scan
- Join Types and Their Cost Implications
- Sorting, Filtering, Aggregation Overheads
- Using
ANALYZE
for Actual Execution Times - Detecting Missing Indexes
- Query Rewriting for Optimization
- Index Selection and Composite Indexes
- Table Statistics and Autovacuum (PostgreSQL)
- Partitioning and Clustering for Performance
- Caching and Materialized Views
- Performance Pitfalls and Anti-Patterns
- Real-World Tuning Example
- Summary and What’s Next
1. Introduction
No matter how well-designed your database is, inefficient SQL queries can lead to performance bottlenecks. EXPLAIN PLAN is a diagnostic tool that helps you understand how a query is executed, and how to improve it.
2. Why SQL Performance Tuning Matters
- Faster query execution
- Reduced cloud/data warehouse costs
- Better user experience for dashboards
- Scalability for growing data volumes
- Helps meet SLAs in production systems
3. What Is EXPLAIN PLAN?
EXPLAIN
(or EXPLAIN PLAN
) is a SQL command that shows the query execution strategy chosen by the database optimizer.
It tells you:
- Which indexes (if any) are used
- How rows are scanned or filtered
- The expected cost and row count
- The execution order of joins and filters
4. Basic Usage of EXPLAIN PLAN
PostgreSQL:
EXPLAIN SELECT * FROM orders WHERE customer_id = 123;
MySQL:
EXPLAIN SELECT * FROM orders WHERE customer_id = 123;
Oracle:
EXPLAIN PLAN FOR SELECT * FROM orders WHERE customer_id = 123;
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
Use EXPLAIN ANALYZE
(PostgreSQL) to run the query and show actual performance.
5. Understanding Query Execution Stages
Typical stages include:
- Table Scan (Seq Scan or Index Scan)
- Filter
- Sort
- Join (Nested Loop, Hash Join, Merge Join)
- Aggregate (GROUP BY, COUNT, SUM)
- Output rows
6. Interpreting Rows, Cost, Width, Time
Example (PostgreSQL):
Seq Scan on orders (cost=0.00..431.00 rows=21000 width=64)
Metric | Meaning |
---|---|
Cost | Estimated time cost (startup..total) |
Rows | Estimated number of output rows |
Width | Estimated row size in bytes |
Use these estimates to evaluate efficiency and selectivity.
7. Common Operators in Query Plans
Operator | Description |
---|---|
Seq Scan | Sequential scan (entire table) |
Index Scan | Scan using index (ordered access) |
Index Only | Index has all required columns |
Bitmap Index | Efficient for multi-key filtering |
Hash Join | Fast join for large unordered sets |
Nested Loop | Best for small joined tables |
8. Seq Scan vs Index Scan
Seq Scan is used when:
- The table is small
- Index is not selective
- The cost of random access is too high
Index Scan is preferred when:
- Filtering on indexed columns
- Returning a few rows
EXPLAIN SELECT * FROM users WHERE id = 5;
Should show Index Scan using pk_users
.
9. Join Types and Their Cost Implications
Join Type | Use Case |
---|---|
Nested Loop | Small inner table, indexed |
Hash Join | Large unsorted tables |
Merge Join | Both tables sorted on join key |
Joins can dominate query cost — reordering tables or adding indexes may help.
10. Sorting, Filtering, Aggregation Overheads
Sorting is expensive on large datasets:
ORDER BY created_at DESC
Adding an index on created_at DESC
can eliminate the sort phase.
Aggregates like GROUP BY
, HAVING
may trigger hash or sort operations. Use indexes or pre-aggregated tables for heavy workloads.
11. Using ANALYZE
for Actual Execution Times
EXPLAIN ANALYZE SELECT * FROM orders WHERE total > 1000;
Returns actual:
- Row count
- Execution time per step
- Number of loops
Helps detect misestimates (e.g., 10k expected rows but 1 million actual).
12. Detecting Missing Indexes
Use EXPLAIN to find sequential scans on large tables:
EXPLAIN SELECT * FROM sales WHERE product_id = 100;
If Seq Scan
appears, consider:
CREATE INDEX idx_sales_product_id ON sales(product_id);
13. Query Rewriting for Optimization
Instead of:
SELECT * FROM orders WHERE status != 'shipped';
Rewrite to:
SELECT * FROM orders WHERE status IN ('pending', 'cancelled');
Negations are hard to optimize — always prefer positive filtering.
14. Index Selection and Composite Indexes
Composite indexes help multi-column filters:
CREATE INDEX idx_customer_order_date ON orders(customer_id, order_date);
Useful for:
SELECT * FROM orders WHERE customer_id = 123 AND order_date > '2023-01-01';
Only the leftmost prefix is used unless all columns are included in filter.
15. Table Statistics and Autovacuum (PostgreSQL)
PostgreSQL’s query planner depends on table stats:
ANALYZE orders;
Autovacuum keeps stats up-to-date, but manual updates can help during large ETL jobs.
16. Partitioning and Clustering for Performance
Partitioning:
CREATE TABLE sales_y2024 PARTITION OF sales FOR VALUES FROM ('2024-01-01') TO ('2024-12-31');
Improves filter performance on date ranges.
Clustering:
CLUSTER orders USING idx_orders_customer_id;
Physically organizes data by index — faster reads for ordered scans.
17. Caching and Materialized Views
Use materialized views to cache expensive queries:
CREATE MATERIALIZED VIEW top_products AS
SELECT product_id, SUM(sales) FROM sales GROUP BY product_id;
Refresh as needed:
REFRESH MATERIALIZED VIEW top_products;
18. Performance Pitfalls and Anti-Patterns
SELECT *
in large joins- Functions in WHERE clause (
DATE(created_at)
) disable index - Using
DISTINCT
instead ofGROUP BY
- Sorting large unsorted tables
- Implicit type casts (e.g.,
WHERE id = '123'
with id as INT)
19. Real-World Tuning Example
Before:
SELECT * FROM logs WHERE DATE(timestamp) = CURRENT_DATE;
Problem:
- Disables index on
timestamp
Optimized:
SELECT * FROM logs
WHERE timestamp >= CURRENT_DATE AND timestamp < CURRENT_DATE + INTERVAL '1 day';
Now uses index — dramatically faster.
20. Summary and What’s Next
EXPLAIN PLAN
is your window into SQL’s execution logic. By interpreting plans, rewriting queries, and applying indexing strategies, you can significantly improve query performance and reduce compute costs.