Data Warehousing Concepts in SQL: Understanding Star and Snowflake Schemas


Table of Contents

  1. Introduction
  2. What Is a Data Warehouse?
  3. OLTP vs OLAP: The Need for Warehousing
  4. Key Components of a Data Warehouse
  5. What Is Dimensional Modeling?
  6. Facts and Dimensions Explained
  7. The Star Schema Architecture
  8. Example of a Star Schema Design
  9. The Snowflake Schema Architecture
  10. Star vs Snowflake Schema Comparison
  11. Denormalization and Performance Trade-offs
  12. Creating Fact and Dimension Tables in SQL
  13. Surrogate Keys and Their Role
  14. Slowly Changing Dimensions (SCD Types)
  15. Handling Time Dimensions
  16. Using Views to Model Analytical Layers
  17. Querying Star Schemas for BI and Reporting
  18. Best Practices for Data Warehouse Schema Design
  19. Real-World Warehouse Scenario: E-Commerce Analytics
  20. Summary and What’s Next

1. Introduction

Data warehousing is the foundation of modern analytics, BI dashboards, and reporting systems. Understanding how to design and query data warehouse schemas—especially Star and Snowflake schemas—is key to building efficient, scalable analytics systems with SQL.


2. What Is a Data Warehouse?

A data warehouse is a centralized repository designed for analytical processing (OLAP) rather than transactional operations. It integrates data from multiple sources, cleans and structures it, and makes it query-ready for decision-making.


3. OLTP vs OLAP: The Need for Warehousing

FeatureOLTP (e.g., MySQL)OLAP (e.g., Snowflake, BigQuery)
PurposeTransaction processingAnalytical processing
NormalizationHighly normalizedDenormalized or star schemas
QueriesSimple, short-livedComplex, large, aggregate queries
Write-intensiveYesNo (mostly read-only)

4. Key Components of a Data Warehouse

  • Fact tables: Store metrics (e.g., sales, orders)
  • Dimension tables: Store attributes (e.g., customer info, product details)
  • ETL/ELT pipelines: Load and transform data
  • Schema design: Star or Snowflake
  • Views and aggregates: Support analysis and reporting

5. What Is Dimensional Modeling?

Dimensional modeling is a design technique for warehouses where data is organized around:

  • Facts (measurable events)
  • Dimensions (descriptive attributes)

The goal is to create schemas that are intuitive and fast to query.


6. Facts and Dimensions Explained

ComponentDescriptionExample
FactNumeric, quantitative valuessales_amount, units_sold
DimensionDescriptive attributes about factsproduct_name, customer_region

Facts link to dimensions using foreign keys.


7. The Star Schema Architecture

In a star schema, a central fact table connects directly to surrounding denormalized dimension tables.

                +-------------+
| dim_product |
+-------------+
|
+-------------+ +-------------+ +-------------+
| dim_customer|---| fact_sales |---| dim_date |
+-------------+ +-------------+ +-------------+
|
+-------------+
| dim_region |
+-------------+

8. Example of a Star Schema Design

Fact Table:

CREATE TABLE fact_sales (
sale_id SERIAL PRIMARY KEY,
product_id INT,
customer_id INT,
date_id INT,
region_id INT,
quantity INT,
total_amount NUMERIC
);

Dimension Table:

CREATE TABLE dim_product (
product_id INT PRIMARY KEY,
product_name TEXT,
category TEXT
);

9. The Snowflake Schema Architecture

In a snowflake schema, dimensions are further normalized into sub-dimensions:

                 +-------------+
| dim_category|
+-------------+
|
+-------------+
| dim_product |
+-------------+
|
+-------------+
| fact_sales |
+-------------+

Product links to category via a foreign key instead of storing the category name directly.


10. Star vs Snowflake Schema Comparison

FeatureStar SchemaSnowflake Schema
DesignDenormalizedNormalized
Query speedFaster (fewer joins)Slower (more joins)
StorageLargerSmaller
MaintenanceEasier to understandMore scalable, harder to read
Best forPerformance, BI toolsComplex hierarchies

11. Denormalization and Performance Trade-offs

Denormalizing dimensions (as in star schema) can:

  • Reduce join complexity
  • Speed up reads
  • Slightly increase storage

Snowflake reduces redundancy but adds complexity and join cost.


12. Creating Fact and Dimension Tables in SQL

Example: Creating both dimensions and fact

CREATE TABLE dim_customer (
customer_id INT PRIMARY KEY,
name TEXT,
email TEXT,
signup_date DATE
);

CREATE TABLE fact_orders (
order_id INT PRIMARY KEY,
customer_id INT,
product_id INT,
date_id INT,
total_amount NUMERIC,
FOREIGN KEY (customer_id) REFERENCES dim_customer(customer_id)
);

13. Surrogate Keys and Their Role

Use surrogate keys (auto-incremented IDs) for dimensions rather than natural keys like email or name:

  • Ensure uniqueness
  • Enable SCDs (Slowly Changing Dimensions)
  • Improve indexing and joins

14. Slowly Changing Dimensions (SCD Types)

TypeDescription
SCD1Overwrite the old value
SCD2Add a new row with new version/timestamp
SCD3Keep only a limited history (e.g., last two values)

Example for SCD Type 2:

INSERT INTO dim_customer_history (customer_id, name, valid_from, valid_to)
VALUES (1, 'Alice', CURRENT_DATE, NULL);

15. Handling Time Dimensions

Time is one of the most common dimensions in warehousing:

CREATE TABLE dim_date (
date_id INT PRIMARY KEY,
full_date DATE,
year INT,
quarter INT,
month INT,
week INT,
day_of_week TEXT
);

Pre-generate a date dimension to allow fast filtering and grouping.


16. Using Views to Model Analytical Layers

Use SQL views to build abstractions:

CREATE VIEW sales_summary AS
SELECT
d.year,
c.name AS customer_name,
p.category,
SUM(f.total_amount) AS total_spent
FROM fact_sales f
JOIN dim_date d ON f.date_id = d.date_id
JOIN dim_customer c ON f.customer_id = c.customer_id
JOIN dim_product p ON f.product_id = p.product_id
GROUP BY d.year, c.name, p.category;

17. Querying Star Schemas for BI and Reporting

SELECT
c.region,
p.category,
d.month,
SUM(f.total_amount) AS monthly_sales
FROM fact_sales f
JOIN dim_customer c ON f.customer_id = c.customer_id
JOIN dim_product p ON f.product_id = p.product_id
JOIN dim_date d ON f.date_id = d.date_id
GROUP BY c.region, p.category, d.month
ORDER BY d.month;

18. Best Practices for Data Warehouse Schema Design

  • Use surrogate keys for dimension relationships
  • Prefer star schema for simplicity and speed
  • Design a calendar dimension
  • Avoid circular joins or many-to-many between dimensions
  • Pre-aggregate for performance (e.g., monthly rollups)
  • Document all fact and dimension tables

19. Real-World Warehouse Scenario: E-Commerce Analytics

Use Case: Track sales, customers, product performance.

TableTypeDescription
fact_salesFactOrders placed with date, customer, product
dim_productDimensionProduct details (name, category)
dim_customerDimensionCustomer demographics
dim_dateDimensionDay, week, month breakdown

20. Summary and What’s Next

Designing proper Star and Snowflake schemas is critical for building high-performance, scalable data warehouses. Understanding how to model facts, dimensions, and time can greatly impact the effectiveness of your reporting and BI efforts.