Table of Contents
- Introduction
- SQL’s Role in Data-Driven Decision Making
- Understanding KPIs in Business Context
- Designing KPIs with SQL
- Measuring Conversion Rates
- Calculating Retention and Churn
- Defining and Analyzing User Cohorts
- Building Signup Cohorts
- Tracking Activity Across Time Buckets
- Cohort Retention Matrix (Week 0 to Week N)
- SQL Techniques for LTV (Lifetime Value)
- Funnel Analysis Using SQL
- Revenue Growth and Month-over-Month Trends
- CAC vs LTV: SQL Computation
- Feature Adoption & Usage Segments
- SQL for A/B Test Reporting
- Segmentation by Geography, Plan, or Behavior
- Rolling Averages and Smoothing Metrics
- Visualizing Metrics with SQL-Friendly Tables
- Summary and What’s Next
1. Introduction
Data analysis is at the heart of business intelligence. SQL allows analysts to compute critical KPIs, evaluate product impact, and understand user behavior at scale. In this module, we’ll focus on advanced metrics like cohort analysis, LTV, retention, and feature adoption.
2. SQL’s Role in Data-Driven Decision Making
SQL enables:
- Fast, reproducible metric calculations
- Filtering and slicing for dynamic segments
- Integration with BI tools for real-time insights
- Customized views for teams (e.g., marketing, product, sales)
3. Understanding KPIs in Business Context
KPIs (Key Performance Indicators) must be:
Attribute | Description |
---|---|
Specific | Clearly defined (e.g., daily active users) |
Measurable | Quantifiable in SQL |
Relevant | Tied to business goals |
Time-bound | Over time (e.g., last 30 days) |
Examples:
- DAU/MAU
- Churn Rate
- Average Revenue Per User (ARPU)
- Conversion Rate
- Lifetime Value (LTV)
4. Designing KPIs with SQL
Daily Active Users (DAU):
SELECT activity_date, COUNT(DISTINCT user_id) AS dau
FROM user_activity
GROUP BY activity_date
ORDER BY activity_date;
Average Revenue Per User (ARPU):
SELECT SUM(revenue)/COUNT(DISTINCT user_id) AS arpu
FROM orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-01-31';
5. Measuring Conversion Rates
Example: Landing → Signup → Purchase
WITH visited AS (
SELECT DISTINCT user_id FROM events WHERE page = 'landing'
),
signed_up AS (
SELECT DISTINCT user_id FROM events WHERE page = 'signup'
),
purchased AS (
SELECT DISTINCT user_id FROM events WHERE page = 'checkout'
)
SELECT
(SELECT COUNT(*) FROM visited) AS landing_visits,
(SELECT COUNT(*) FROM signed_up) AS signups,
(SELECT COUNT(*) FROM purchased) AS purchases,
ROUND(100.0 * (SELECT COUNT(*) FROM signed_up) / (SELECT COUNT(*) FROM visited), 2) AS signup_rate,
ROUND(100.0 * (SELECT COUNT(*) FROM purchased) / (SELECT COUNT(*) FROM signed_up), 2) AS purchase_rate;
6. Calculating Retention and Churn
Retention:
WITH first_seen AS (
SELECT user_id, MIN(login_date) AS cohort_date
FROM logins
GROUP BY user_id
),
activity AS (
SELECT user_id, login_date::date AS activity_day
FROM logins
)
SELECT
f.cohort_date,
a.activity_day,
COUNT(DISTINCT a.user_id) AS retained_users
FROM first_seen f
JOIN activity a ON f.user_id = a.user_id
WHERE a.activity_day >= f.cohort_date
GROUP BY f.cohort_date, a.activity_day;
Churn (simplified):
SELECT
COUNT(*) FILTER (WHERE DATEDIFF('day', last_login, CURRENT_DATE) > 30) * 100.0 / COUNT(*) AS churn_rate
FROM users;
7. Defining and Analyzing User Cohorts
Cohorts group users based on a shared time-based event (e.g., signup week/month). This allows performance comparisons across generations of users.
SELECT user_id, DATE_TRUNC('month', signup_date) AS cohort_month
FROM users;
8. Building Signup Cohorts
WITH cohorts AS (
SELECT user_id, DATE_TRUNC('month', signup_date) AS cohort_month
FROM users
)
SELECT cohort_month, COUNT(*) AS new_users
FROM cohorts
GROUP BY cohort_month
ORDER BY cohort_month;
9. Tracking Activity Across Time Buckets
WITH user_logins AS (
SELECT user_id,
DATE_TRUNC('month', MIN(login_date)) AS cohort,
DATE_TRUNC('month', login_date) AS active_month
FROM logins
GROUP BY user_id, login_date
)
SELECT cohort, active_month, COUNT(DISTINCT user_id) AS active_users
FROM user_logins
GROUP BY cohort, active_month
ORDER BY cohort, active_month;
10. Cohort Retention Matrix (Week 0 to Week N)
WITH base AS (
SELECT user_id, MIN(login_date) AS signup_date
FROM logins
GROUP BY user_id
),
activity AS (
SELECT l.user_id,
DATE_TRUNC('week', b.signup_date) AS cohort_week,
FLOOR(DATE_PART('day', l.login_date - b.signup_date)/7) AS week_number
FROM logins l
JOIN base b ON l.user_id = b.user_id
)
SELECT cohort_week, week_number, COUNT(DISTINCT user_id) AS retained
FROM activity
GROUP BY cohort_week, week_number
ORDER BY cohort_week, week_number;
11. SQL Techniques for LTV (Lifetime Value)
SELECT customer_id, SUM(order_total) AS ltv
FROM orders
GROUP BY customer_id;
Segment by cohort:
WITH cohorted_orders AS (
SELECT o.customer_id, DATE_TRUNC('month', u.signup_date) AS cohort,
o.order_total
FROM orders o
JOIN users u ON o.customer_id = u.user_id
)
SELECT cohort, AVG(order_total) AS avg_ltv
FROM cohorted_orders
GROUP BY cohort;
12. Funnel Analysis Using SQL
WITH funnel AS (
SELECT user_id,
MAX(CASE WHEN event = 'visit' THEN 1 END) AS visited,
MAX(CASE WHEN event = 'signup' THEN 1 END) AS signed_up,
MAX(CASE WHEN event = 'purchase' THEN 1 END) AS purchased
FROM events
GROUP BY user_id
)
SELECT
COUNT(*) AS total_users,
COUNT(*) FILTER (WHERE signed_up = 1) AS signed_up,
COUNT(*) FILTER (WHERE purchased = 1) AS purchased,
ROUND(100.0 * COUNT(*) FILTER (WHERE signed_up = 1) / COUNT(*), 2) AS signup_rate,
ROUND(100.0 * COUNT(*) FILTER (WHERE purchased = 1) / COUNT(*), 2) AS conversion_rate
FROM funnel;
13. Revenue Growth and Month-over-Month Trends
WITH monthly_revenue AS (
SELECT DATE_TRUNC('month', order_date) AS month, SUM(total) AS revenue
FROM orders
GROUP BY month
)
SELECT month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_month,
revenue - LAG(revenue) OVER (ORDER BY month) AS mom_change,
ROUND((revenue - LAG(revenue) OVER (ORDER BY month)) * 100.0 / LAG(revenue) OVER (ORDER BY month), 2) AS mom_growth
FROM monthly_revenue;
14. CAC vs LTV: SQL Computation
WITH customer_acquisition AS (
SELECT customer_id, SUM(spend) AS cac
FROM ad_spend
GROUP BY customer_id
),
ltv AS (
SELECT customer_id, SUM(order_total) AS ltv
FROM orders
GROUP BY customer_id
)
SELECT a.customer_id, a.cac, l.ltv,
ROUND(l.ltv / NULLIF(a.cac, 0), 2) AS ltv_to_cac_ratio
FROM customer_acquisition a
JOIN ltv l ON a.customer_id = l.customer_id;
15. Feature Adoption & Usage Segments
SELECT feature_name, COUNT(DISTINCT user_id) AS users_used
FROM feature_usage
GROUP BY feature_name
ORDER BY users_used DESC;
Analyze adoption by segment:
SELECT segment, feature_name, COUNT(*) AS usage_count
FROM users u
JOIN feature_usage f ON u.user_id = f.user_id
GROUP BY segment, feature_name;
16. SQL for A/B Test Reporting
WITH groups AS (
SELECT user_id, experiment_group FROM ab_assignments
),
metrics AS (
SELECT user_id, SUM(purchase_amount) AS revenue
FROM orders
GROUP BY user_id
)
SELECT g.experiment_group, COUNT(*) AS users,
AVG(m.revenue) AS avg_revenue
FROM groups g
LEFT JOIN metrics m ON g.user_id = m.user_id
GROUP BY g.experiment_group;
17. Segmentation by Geography, Plan, or Behavior
SELECT plan_type, region, AVG(lifetime_value) AS avg_ltv
FROM customers
GROUP BY plan_type, region;
Or by behavioral metric:
SELECT CASE
WHEN order_count >= 10 THEN 'high value'
WHEN order_count >= 3 THEN 'medium value'
ELSE 'low value'
END AS segment,
COUNT(*) AS users
FROM (
SELECT customer_id, COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
) sub
GROUP BY segment;
18. Rolling Averages and Smoothing Metrics
SELECT date,
AVG(revenue) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS revenue_7day_avg
FROM daily_revenue;
19. Visualizing Metrics with SQL-Friendly Tables
Many BI tools support SQL as a data source. Use consistent column names, date truncation, and ordering:
SELECT DATE_TRUNC('week', login_date) AS week, COUNT(DISTINCT user_id) AS weekly_active_users
FROM logins
GROUP BY week
ORDER BY week;
20. Summary and What’s Next
SQL enables powerful, reproducible, and scalable data analysis — from core KPIs to cohort-based insights. With mastery over cohorting, window functions, and segmentation, SQL becomes your primary tool for driving data-informed decisions.