SQL Performance Tuning with EXPLAIN PLAN: Understanding Query Optimization


Table of Contents

  1. Introduction
  2. Why SQL Performance Tuning Matters
  3. What Is EXPLAIN PLAN?
  4. Basic Usage of EXPLAIN PLAN (PostgreSQL, MySQL, Oracle)
  5. Understanding Query Execution Stages
  6. Interpreting Rows, Cost, Width, Time
  7. Common Operators in Query Plans
  8. Seq Scan vs Index Scan
  9. Join Types and Their Cost Implications
  10. Sorting, Filtering, Aggregation Overheads
  11. Using ANALYZE for Actual Execution Times
  12. Detecting Missing Indexes
  13. Query Rewriting for Optimization
  14. Index Selection and Composite Indexes
  15. Table Statistics and Autovacuum (PostgreSQL)
  16. Partitioning and Clustering for Performance
  17. Caching and Materialized Views
  18. Performance Pitfalls and Anti-Patterns
  19. Real-World Tuning Example
  20. Summary and What’s Next

1. Introduction

No matter how well-designed your database is, inefficient SQL queries can lead to performance bottlenecks. EXPLAIN PLAN is a diagnostic tool that helps you understand how a query is executed, and how to improve it.


2. Why SQL Performance Tuning Matters

  • Faster query execution
  • Reduced cloud/data warehouse costs
  • Better user experience for dashboards
  • Scalability for growing data volumes
  • Helps meet SLAs in production systems

3. What Is EXPLAIN PLAN?

EXPLAIN (or EXPLAIN PLAN) is a SQL command that shows the query execution strategy chosen by the database optimizer.

It tells you:

  • Which indexes (if any) are used
  • How rows are scanned or filtered
  • The expected cost and row count
  • The execution order of joins and filters

4. Basic Usage of EXPLAIN PLAN

PostgreSQL:

EXPLAIN SELECT * FROM orders WHERE customer_id = 123;

MySQL:

EXPLAIN SELECT * FROM orders WHERE customer_id = 123;

Oracle:

EXPLAIN PLAN FOR SELECT * FROM orders WHERE customer_id = 123;
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

Use EXPLAIN ANALYZE (PostgreSQL) to run the query and show actual performance.


5. Understanding Query Execution Stages

Typical stages include:

  • Table Scan (Seq Scan or Index Scan)
  • Filter
  • Sort
  • Join (Nested Loop, Hash Join, Merge Join)
  • Aggregate (GROUP BY, COUNT, SUM)
  • Output rows

6. Interpreting Rows, Cost, Width, Time

Example (PostgreSQL):

Seq Scan on orders  (cost=0.00..431.00 rows=21000 width=64)
MetricMeaning
CostEstimated time cost (startup..total)
RowsEstimated number of output rows
WidthEstimated row size in bytes

Use these estimates to evaluate efficiency and selectivity.


7. Common Operators in Query Plans

OperatorDescription
Seq ScanSequential scan (entire table)
Index ScanScan using index (ordered access)
Index OnlyIndex has all required columns
Bitmap IndexEfficient for multi-key filtering
Hash JoinFast join for large unordered sets
Nested LoopBest for small joined tables

8. Seq Scan vs Index Scan

Seq Scan is used when:

  • The table is small
  • Index is not selective
  • The cost of random access is too high

Index Scan is preferred when:

  • Filtering on indexed columns
  • Returning a few rows
EXPLAIN SELECT * FROM users WHERE id = 5;

Should show Index Scan using pk_users.


9. Join Types and Their Cost Implications

Join TypeUse Case
Nested LoopSmall inner table, indexed
Hash JoinLarge unsorted tables
Merge JoinBoth tables sorted on join key

Joins can dominate query cost — reordering tables or adding indexes may help.


10. Sorting, Filtering, Aggregation Overheads

Sorting is expensive on large datasets:

ORDER BY created_at DESC

Adding an index on created_at DESC can eliminate the sort phase.

Aggregates like GROUP BY, HAVING may trigger hash or sort operations. Use indexes or pre-aggregated tables for heavy workloads.


11. Using ANALYZE for Actual Execution Times

EXPLAIN ANALYZE SELECT * FROM orders WHERE total > 1000;

Returns actual:

  • Row count
  • Execution time per step
  • Number of loops

Helps detect misestimates (e.g., 10k expected rows but 1 million actual).


12. Detecting Missing Indexes

Use EXPLAIN to find sequential scans on large tables:

EXPLAIN SELECT * FROM sales WHERE product_id = 100;

If Seq Scan appears, consider:

CREATE INDEX idx_sales_product_id ON sales(product_id);

13. Query Rewriting for Optimization

Instead of:

SELECT * FROM orders WHERE status != 'shipped';

Rewrite to:

SELECT * FROM orders WHERE status IN ('pending', 'cancelled');

Negations are hard to optimize — always prefer positive filtering.


14. Index Selection and Composite Indexes

Composite indexes help multi-column filters:

CREATE INDEX idx_customer_order_date ON orders(customer_id, order_date);

Useful for:

SELECT * FROM orders WHERE customer_id = 123 AND order_date > '2023-01-01';

Only the leftmost prefix is used unless all columns are included in filter.


15. Table Statistics and Autovacuum (PostgreSQL)

PostgreSQL’s query planner depends on table stats:

ANALYZE orders;

Autovacuum keeps stats up-to-date, but manual updates can help during large ETL jobs.


16. Partitioning and Clustering for Performance

Partitioning:

CREATE TABLE sales_y2024 PARTITION OF sales FOR VALUES FROM ('2024-01-01') TO ('2024-12-31');

Improves filter performance on date ranges.

Clustering:

CLUSTER orders USING idx_orders_customer_id;

Physically organizes data by index — faster reads for ordered scans.


17. Caching and Materialized Views

Use materialized views to cache expensive queries:

CREATE MATERIALIZED VIEW top_products AS
SELECT product_id, SUM(sales) FROM sales GROUP BY product_id;

Refresh as needed:

REFRESH MATERIALIZED VIEW top_products;

18. Performance Pitfalls and Anti-Patterns

  • SELECT * in large joins
  • Functions in WHERE clause (DATE(created_at)) disable index
  • Using DISTINCT instead of GROUP BY
  • Sorting large unsorted tables
  • Implicit type casts (e.g., WHERE id = '123' with id as INT)

19. Real-World Tuning Example

Before:

SELECT * FROM logs WHERE DATE(timestamp) = CURRENT_DATE;

Problem:

  • Disables index on timestamp

Optimized:

SELECT * FROM logs
WHERE timestamp >= CURRENT_DATE AND timestamp < CURRENT_DATE + INTERVAL '1 day';

Now uses index — dramatically faster.


20. Summary and What’s Next

EXPLAIN PLAN is your window into SQL’s execution logic. By interpreting plans, rewriting queries, and applying indexing strategies, you can significantly improve query performance and reduce compute costs.