Data Warehousing Concepts in SQL: Understanding Star and Snowflake Schemas

Introduction
What Is a Data Warehouse?
OLTP vs OLAP: The Need for Warehousing
Key Components of a Data Warehouse
What Is Dimensional Modeling?
Facts and Dimensions Explained
The Star Schema Architecture
Example of a Star Schema Design
The Snowflake Schema Architecture
Star vs Snowflake Schema Comparison
Denormalization and Performance Trade-offs
Creating Fact and Dimension Tables in SQL
Surrogate Keys and Their Role
Slowly Changing Dimensions (SCD Types)
Handling Time Dimensions
Using Views to Model Analytical Layers
Querying Star Schemas for BI and Reporting
Best Practices for Data Warehouse Schema Design
Real-World Warehouse Scenario: E-Commerce Analytics
Summary and What’s Next

1. Introduction

Data warehousing is the foundation of modern analytics, BI dashboards, and reporting systems. Understanding how to design and query data warehouse schemas—especially Star and Snowflake schemas—is key to building efficient, scalable analytics systems with SQL.

2. What Is a Data Warehouse?

A data warehouse is a centralized repository designed for analytical processing (OLAP) rather than transactional operations. It integrates data from multiple sources, cleans and structures it, and makes it query-ready for decision-making.

3. OLTP vs OLAP: The Need for Warehousing

Feature	OLTP (e.g., MySQL)	OLAP (e.g., Snowflake, BigQuery)
Purpose	Transaction processing	Analytical processing
Normalization	Highly normalized	Denormalized or star schemas
Queries	Simple, short-lived	Complex, large, aggregate queries
Write-intensive	Yes	No (mostly read-only)

4. Key Components of a Data Warehouse

Fact tables: Store metrics (e.g., sales, orders)
Dimension tables: Store attributes (e.g., customer info, product details)
ETL/ELT pipelines: Load and transform data
Schema design: Star or Snowflake
Views and aggregates: Support analysis and reporting

5. What Is Dimensional Modeling?

Dimensional modeling is a design technique for warehouses where data is organized around:

Facts (measurable events)
Dimensions (descriptive attributes)

The goal is to create schemas that are intuitive and fast to query.

6. Facts and Dimensions Explained

Component	Description	Example
Fact	Numeric, quantitative values	sales_amount, units_sold
Dimension	Descriptive attributes about facts	product_name, customer_region

Facts link to dimensions using foreign keys.

7. The Star Schema Architecture

In a star schema, a central fact table connects directly to surrounding denormalized dimension tables.

                +-------------+
                | dim_product |
                +-------------+
                       |
+-------------+   +-------------+   +-------------+
| dim_customer|---|  fact_sales |---| dim_date    |
+-------------+   +-------------+   +-------------+
                       |
                +-------------+
                | dim_region  |
                +-------------+

8. Example of a Star Schema Design

Fact Table:

CREATE TABLE fact_sales (
  sale_id SERIAL PRIMARY KEY,
  product_id INT,
  customer_id INT,
  date_id INT,
  region_id INT,
  quantity INT,
  total_amount NUMERIC
);

Dimension Table:

CREATE TABLE dim_product (
  product_id INT PRIMARY KEY,
  product_name TEXT,
  category TEXT
);

9. The Snowflake Schema Architecture

In a snowflake schema, dimensions are further normalized into sub-dimensions:

                 +-------------+
                 | dim_category|
                 +-------------+
                       |
                +-------------+
                | dim_product |
                +-------------+
                       |
                +-------------+
                | fact_sales  |
                +-------------+

Product links to category via a foreign key instead of storing the category name directly.

10. Star vs Snowflake Schema Comparison

Feature	Star Schema	Snowflake Schema
Design	Denormalized	Normalized
Query speed	Faster (fewer joins)	Slower (more joins)
Storage	Larger	Smaller
Maintenance	Easier to understand	More scalable, harder to read
Best for	Performance, BI tools	Complex hierarchies

11. Denormalization and Performance Trade-offs

Denormalizing dimensions (as in star schema) can:

Reduce join complexity
Speed up reads
Slightly increase storage

Snowflake reduces redundancy but adds complexity and join cost.

12. Creating Fact and Dimension Tables in SQL

Example: Creating both dimensions and fact

CREATE TABLE dim_customer (
  customer_id INT PRIMARY KEY,
  name TEXT,
  email TEXT,
  signup_date DATE
);

CREATE TABLE fact_orders (
  order_id INT PRIMARY KEY,
  customer_id INT,
  product_id INT,
  date_id INT,
  total_amount NUMERIC,
  FOREIGN KEY (customer_id) REFERENCES dim_customer(customer_id)
);

13. Surrogate Keys and Their Role

Use surrogate keys (auto-incremented IDs) for dimensions rather than natural keys like email or name:

Ensure uniqueness
Enable SCDs (Slowly Changing Dimensions)
Improve indexing and joins

14. Slowly Changing Dimensions (SCD Types)

Type	Description
SCD1	Overwrite the old value
SCD2	Add a new row with new version/timestamp
SCD3	Keep only a limited history (e.g., last two values)

Example for SCD Type 2:

INSERT INTO dim_customer_history (customer_id, name, valid_from, valid_to)
VALUES (1, 'Alice', CURRENT_DATE, NULL);

15. Handling Time Dimensions

Time is one of the most common dimensions in warehousing:

CREATE TABLE dim_date (
  date_id INT PRIMARY KEY,
  full_date DATE,
  year INT,
  quarter INT,
  month INT,
  week INT,
  day_of_week TEXT
);

Pre-generate a date dimension to allow fast filtering and grouping.

16. Using Views to Model Analytical Layers

Use SQL views to build abstractions:

CREATE VIEW sales_summary AS
SELECT
  d.year,
  c.name AS customer_name,
  p.category,
  SUM(f.total_amount) AS total_spent
FROM fact_sales f
JOIN dim_date d ON f.date_id = d.date_id
JOIN dim_customer c ON f.customer_id = c.customer_id
JOIN dim_product p ON f.product_id = p.product_id
GROUP BY d.year, c.name, p.category;

17. Querying Star Schemas for BI and Reporting

SELECT
  c.region,
  p.category,
  d.month,
  SUM(f.total_amount) AS monthly_sales
FROM fact_sales f
JOIN dim_customer c ON f.customer_id = c.customer_id
JOIN dim_product p ON f.product_id = p.product_id
JOIN dim_date d ON f.date_id = d.date_id
GROUP BY c.region, p.category, d.month
ORDER BY d.month;

18. Best Practices for Data Warehouse Schema Design

Use surrogate keys for dimension relationships
Prefer star schema for simplicity and speed
Design a calendar dimension
Avoid circular joins or many-to-many between dimensions
Pre-aggregate for performance (e.g., monthly rollups)
Document all fact and dimension tables

19. Real-World Warehouse Scenario: E-Commerce Analytics

Use Case: Track sales, customers, product performance.

Table	Type	Description
fact_sales	Fact	Orders placed with date, customer, product
dim_product	Dimension	Product details (name, category)
dim_customer	Dimension	Customer demographics
dim_date	Dimension	Day, week, month breakdown

20. Summary and What’s Next

Designing proper Star and Snowflake schemas is critical for building high-performance, scalable data warehouses. Understanding how to model facts, dimensions, and time can greatly impact the effectiveness of your reporting and BI efforts.

Tags
SQL

Welcome to Syskool

Welcome to Syskool

Welcome to Syskool

Welcome to Syskool

Data Warehousing Concepts in SQL: Understanding Star and Snowflake Schemas

Table of Contents

1. Introduction

2. What Is a Data Warehouse?

3. OLTP vs OLAP: The Need for Warehousing

4. Key Components of a Data Warehouse

5. What Is Dimensional Modeling?

6. Facts and Dimensions Explained

7. The Star Schema Architecture

8. Example of a Star Schema Design

9. The Snowflake Schema Architecture

10. Star vs Snowflake Schema Comparison

11. Denormalization and Performance Trade-offs

12. Creating Fact and Dimension Tables in SQL

13. Surrogate Keys and Their Role

14. Slowly Changing Dimensions (SCD Types)

15. Handling Time Dimensions

16. Using Views to Model Analytical Layers

17. Querying Star Schemas for BI and Reporting

18. Best Practices for Data Warehouse Schema Design

19. Real-World Warehouse Scenario: E-Commerce Analytics

20. Summary and What’s Next

LEAVE A REPLY Cancel reply

Subscribe for exclusive content

Welcome to Syskool

Welcome to Syskool

Welcome to Syskool

Subscribe to Syskool

Subscribe to Liberty Case

Welcome to Syskool

Data Warehousing Concepts in SQL: Understanding Star and Snowflake Schemas

Table of Contents

1. Introduction

2. What Is a Data Warehouse?

3. OLTP vs OLAP: The Need for Warehousing

4. Key Components of a Data Warehouse

5. What Is Dimensional Modeling?

6. Facts and Dimensions Explained

7. The Star Schema Architecture

8. Example of a Star Schema Design

9. The Snowflake Schema Architecture

10. Star vs Snowflake Schema Comparison

11. Denormalization and Performance Trade-offs

12. Creating Fact and Dimension Tables in SQL

13. Surrogate Keys and Their Role

14. Slowly Changing Dimensions (SCD Types)

15. Handling Time Dimensions

16. Using Views to Model Analytical Layers

17. Querying Star Schemas for BI and Reporting

18. Best Practices for Data Warehouse Schema Design

19. Real-World Warehouse Scenario: E-Commerce Analytics

20. Summary and What’s Next

RELATED ARTICLES

Working with NoSQL-like Features in SQL: Arrays, HSTORE, and Semi-Structured Models

Version Control for SQL Scripts: Git, Database Migrations, and Change Management

SQL Performance Tuning with EXPLAIN PLAN: Understanding Query Optimization

LEAVE A REPLY Cancel reply

Subscribe for exclusive content