Table of Contents
- Introduction
- What is the
DISTINCT
Keyword? - Why and When to Use
DISTINCT
- Syntax of
DISTINCT
- How
DISTINCT
Works Internally DISTINCT
on a Single ColumnDISTINCT
on Multiple ColumnsDISTINCT
vsGROUP BY
- Using
DISTINCT
withCOUNT()
- Combining
DISTINCT
withORDER BY
- Filtering Results After
DISTINCT
- Case Sensitivity in
DISTINCT
- Null Values and
DISTINCT
- Performance Considerations
- Real-World Example: Removing Duplicate Emails
- Real-World Example: Unique Customer Locations
- Common Mistakes to Avoid
- Alternatives to
DISTINCT
- Best Practices for Removing Duplicates
- Summary and What’s Next
1. Introduction
When retrieving data from databases, you often encounter duplicate rows — especially when joining tables or aggregating data. SQL’s DISTINCT
keyword is a simple yet powerful tool to eliminate duplicates and return only unique values.
2. What is the DISTINCT
Keyword?
The DISTINCT
keyword in SQL ensures that the result set returns only unique rows by eliminating all duplicates based on the specified columns.
3. Why and When to Use DISTINCT
Use DISTINCT
when:
- You want to remove repeated rows
- You need a list of unique values from one or more columns
- You’re avoiding redundancy in dropdowns, reports, or exports
- You’ve performed a join or union that introduced duplicates
4. Syntax of DISTINCT
sqlCopyEditSELECT DISTINCT column1, column2, ...
FROM table_name;
You can apply it to one or multiple columns.
5. How DISTINCT
Works Internally
Under the hood, SQL compares rows after projection (SELECT clause) and removes duplicate rows. The remaining result set is sorted or hashed to identify and eliminate redundancy.
6. DISTINCT
on a Single Column
sqlCopyEditSELECT DISTINCT country
FROM customers;
Returns a list of unique countries from the customers
table.
7. DISTINCT
on Multiple Columns
sqlCopyEditSELECT DISTINCT city, state
FROM customers;
Returns unique city + state combinations.
Rows with the same city but different states are not treated as duplicates.
8. DISTINCT
vs GROUP BY
Feature | DISTINCT | GROUP BY |
---|---|---|
Purpose | Remove duplicate rows | Group rows for aggregation |
Aggregation | Not required | Typically used with aggregation functions |
Output | Only unique rows | One row per group |
Example:
sqlCopyEdit-- DISTINCT
SELECT DISTINCT department FROM employees;
-- GROUP BY
SELECT department FROM employees GROUP BY department;
Both yield similar output — but use GROUP BY
for summaries, not deduplication.
9. Using DISTINCT
with COUNT()
sqlCopyEditSELECT COUNT(DISTINCT department) FROM employees;
Returns number of unique departments.
10. Combining DISTINCT
with ORDER BY
sqlCopyEditSELECT DISTINCT name FROM customers
ORDER BY name;
You can sort results after deduplication.
Note: ORDER BY
is applied after DISTINCT
.
11. Filtering Results After DISTINCT
You can use WHERE
to filter rows before applying DISTINCT
:
sqlCopyEditSELECT DISTINCT country FROM customers
WHERE active = 1;
Filters active customers first, then selects distinct countries.
12. Case Sensitivity in DISTINCT
- In PostgreSQL,
DISTINCT
is case-sensitive:'India'
≠'india'
- In MySQL, depends on collation (
utf8_general_ci
is case-insensitive)
Always be aware of your DBMS collation and character set.
13. Null Values and DISTINCT
NULL
is treated as a distinct value- Multiple
NULL
s are considered duplicates for deduplication
sqlCopyEditSELECT DISTINCT department FROM employees;
If 3 rows have NULL
department, only one NULL
is returned.
14. Performance Considerations
DISTINCT
can be expensive on large datasets- Consider indexing columns involved
- Avoid
SELECT DISTINCT *
— it scans all columns and rows - Use
GROUP BY
with aggregates if more appropriate
15. Real-World Example: Removing Duplicate Emails
sqlCopyEditSELECT DISTINCT email FROM users;
Removes any repeated email addresses.
16. Real-World Example: Unique Customer Locations
sqlCopyEditSELECT DISTINCT city, state
FROM customers
WHERE country = 'India';
Returns all unique city+state combinations for Indian customers.
17. Common Mistakes to Avoid
Mistake | Why It’s a Problem |
---|---|
DISTINCT with * | Expensive, returns unique rows based on all columns |
Expecting column-wise deduplication | DISTINCT works row-wise, not per column |
Not using aliases in complex queries | Can lead to confusion and redundancy |
18. Alternatives to DISTINCT
- Use
GROUP BY
if summarizing - Use
ROW_NUMBER()
orRANK()
for first/last row per group - Use
EXISTS
orIN
in subqueries for presence checks - Create temporary tables or views for deduped sets
19. Best Practices for Removing Duplicates
- Be specific with column selection
- Apply filters (
WHERE
) before usingDISTINCT
- Add
ORDER BY
if result sequence matters - Use
COUNT(DISTINCT column)
for summaries - Benchmark performance on large result sets
20. Summary and What’s Next
The DISTINCT
keyword in SQL is a simple yet powerful tool to eliminate duplicates from your result set. Whether used on a single column or multiple, it ensures that your output is clean, lean, and logically precise.