Using DISTINCT to Remove Duplicates in SQL

Introduction
What is the DISTINCT Keyword?
Why and When to Use DISTINCT
Syntax of DISTINCT
How DISTINCT Works Internally
DISTINCT on a Single Column
DISTINCT on Multiple Columns
DISTINCT vs GROUP BY
Using DISTINCT with COUNT()
Combining DISTINCT with ORDER BY
Filtering Results After DISTINCT
Case Sensitivity in DISTINCT
Null Values and DISTINCT
Performance Considerations
Real-World Example: Removing Duplicate Emails
Real-World Example: Unique Customer Locations
Common Mistakes to Avoid
Alternatives to DISTINCT
Best Practices for Removing Duplicates
Summary and What’s Next

1. Introduction

When retrieving data from databases, you often encounter duplicate rows — especially when joining tables or aggregating data. SQL’s DISTINCT keyword is a simple yet powerful tool to eliminate duplicates and return only unique values.

2. What is the `DISTINCT` Keyword?

The DISTINCT keyword in SQL ensures that the result set returns only unique rows by eliminating all duplicates based on the specified columns.

3. Why and When to Use `DISTINCT`

Use DISTINCT when:

You want to remove repeated rows
You need a list of unique values from one or more columns
You’re avoiding redundancy in dropdowns, reports, or exports
You’ve performed a join or union that introduced duplicates

4. Syntax of `DISTINCT`

sqlCopyEditSELECT DISTINCT column1, column2, ...
FROM table_name;

You can apply it to one or multiple columns.

5. How `DISTINCT` Works Internally

Under the hood, SQL compares rows after projection (SELECT clause) and removes duplicate rows. The remaining result set is sorted or hashed to identify and eliminate redundancy.

6. `DISTINCT` on a Single Column

sqlCopyEditSELECT DISTINCT country
FROM customers;

Returns a list of unique countries from the customers table.

7. `DISTINCT` on Multiple Columns

sqlCopyEditSELECT DISTINCT city, state
FROM customers;

Returns unique city + state combinations.
Rows with the same city but different states are not treated as duplicates.

8. `DISTINCT` vs `GROUP BY`

Feature	`DISTINCT`	`GROUP BY`
Purpose	Remove duplicate rows	Group rows for aggregation
Aggregation	Not required	Typically used with aggregation functions
Output	Only unique rows	One row per group

Example:

sqlCopyEdit-- DISTINCT
SELECT DISTINCT department FROM employees;

-- GROUP BY
SELECT department FROM employees GROUP BY department;

Both yield similar output — but use GROUP BY for summaries, not deduplication.

9. Using `DISTINCT` with `COUNT()`

sqlCopyEditSELECT COUNT(DISTINCT department) FROM employees;

Returns number of unique departments.

10. Combining `DISTINCT` with `ORDER BY`

sqlCopyEditSELECT DISTINCT name FROM customers
ORDER BY name;

You can sort results after deduplication.
Note: ORDER BY is applied after DISTINCT.

11. Filtering Results After `DISTINCT`

You can use WHERE to filter rows before applying DISTINCT:

sqlCopyEditSELECT DISTINCT country FROM customers
WHERE active = 1;

Filters active customers first, then selects distinct countries.

12. Case Sensitivity in `DISTINCT`

In PostgreSQL, DISTINCT is case-sensitive: 'India' ≠ 'india'
In MySQL, depends on collation (utf8_general_ci is case-insensitive)

Always be aware of your DBMS collation and character set.

13. Null Values and `DISTINCT`

NULL is treated as a distinct value
Multiple NULLs are considered duplicates for deduplication

sqlCopyEditSELECT DISTINCT department FROM employees;

If 3 rows have NULL department, only one NULL is returned.

14. Performance Considerations

DISTINCT can be expensive on large datasets
Consider indexing columns involved
Avoid SELECT DISTINCT * — it scans all columns and rows
Use GROUP BY with aggregates if more appropriate

15. Real-World Example: Removing Duplicate Emails

sqlCopyEditSELECT DISTINCT email FROM users;

Removes any repeated email addresses.

16. Real-World Example: Unique Customer Locations

sqlCopyEditSELECT DISTINCT city, state
FROM customers
WHERE country = 'India';

Returns all unique city+state combinations for Indian customers.

17. Common Mistakes to Avoid

Mistake	Why It’s a Problem
`DISTINCT` with `*`	Expensive, returns unique rows based on all columns
Expecting column-wise deduplication	`DISTINCT` works row-wise, not per column
Not using aliases in complex queries	Can lead to confusion and redundancy

18. Alternatives to `DISTINCT`

Use GROUP BY if summarizing
Use ROW_NUMBER() or RANK() for first/last row per group
Use EXISTS or IN in subqueries for presence checks
Create temporary tables or views for deduped sets

19. Best Practices for Removing Duplicates

Be specific with column selection
Apply filters (WHERE) before using DISTINCT
Add ORDER BY if result sequence matters
Use COUNT(DISTINCT column) for summaries
Benchmark performance on large result sets

20. Summary and What’s Next

The DISTINCT keyword in SQL is a simple yet powerful tool to eliminate duplicates from your result set. Whether used on a single column or multiple, it ensures that your output is clean, lean, and logically precise.

Tags
SQL

Welcome to Syskool

Welcome to Syskool

Welcome to Syskool

Welcome to Syskool

Using DISTINCT to Remove Duplicates in SQL

Table of Contents

1. Introduction

2. What is the `DISTINCT` Keyword?

3. Why and When to Use `DISTINCT`

4. Syntax of `DISTINCT`

5. How `DISTINCT` Works Internally

6. `DISTINCT` on a Single Column

7. `DISTINCT` on Multiple Columns

8. `DISTINCT` vs `GROUP BY`

9. Using `DISTINCT` with `COUNT()`

10. Combining `DISTINCT` with `ORDER BY`

11. Filtering Results After `DISTINCT`

12. Case Sensitivity in `DISTINCT`

13. Null Values and `DISTINCT`

14. Performance Considerations

15. Real-World Example: Removing Duplicate Emails

16. Real-World Example: Unique Customer Locations

17. Common Mistakes to Avoid

18. Alternatives to `DISTINCT`

19. Best Practices for Removing Duplicates

20. Summary and What’s Next

LEAVE A REPLY Cancel reply

Subscribe for exclusive content

Welcome to Syskool

Welcome to Syskool

Welcome to Syskool

Subscribe to Syskool

Subscribe to Liberty Case

Welcome to Syskool

Using DISTINCT to Remove Duplicates in SQL

Table of Contents

1. Introduction

2. What is the DISTINCT Keyword?

3. Why and When to Use DISTINCT

4. Syntax of DISTINCT

5. How DISTINCT Works Internally

6. DISTINCT on a Single Column

7. DISTINCT on Multiple Columns

8. DISTINCT vs GROUP BY

9. Using DISTINCT with COUNT()

10. Combining DISTINCT with ORDER BY

11. Filtering Results After DISTINCT

12. Case Sensitivity in DISTINCT

13. Null Values and DISTINCT

14. Performance Considerations

15. Real-World Example: Removing Duplicate Emails

16. Real-World Example: Unique Customer Locations

17. Common Mistakes to Avoid

18. Alternatives to DISTINCT

19. Best Practices for Removing Duplicates

20. Summary and What’s Next

RELATED ARTICLES

Working with NoSQL-like Features in SQL: Arrays, HSTORE, and Semi-Structured Models

Version Control for SQL Scripts: Git, Database Migrations, and Change Management

SQL Performance Tuning with EXPLAIN PLAN: Understanding Query Optimization

LEAVE A REPLY Cancel reply

Subscribe for exclusive content

2. What is the `DISTINCT` Keyword?

3. Why and When to Use `DISTINCT`

4. Syntax of `DISTINCT`

5. How `DISTINCT` Works Internally

6. `DISTINCT` on a Single Column

7. `DISTINCT` on Multiple Columns

8. `DISTINCT` vs `GROUP BY`

9. Using `DISTINCT` with `COUNT()`

10. Combining `DISTINCT` with `ORDER BY`

11. Filtering Results After `DISTINCT`

12. Case Sensitivity in `DISTINCT`

13. Null Values and `DISTINCT`

18. Alternatives to `DISTINCT`