Using DISTINCT to Remove Duplicates in SQL


Table of Contents

  1. Introduction
  2. What is the DISTINCT Keyword?
  3. Why and When to Use DISTINCT
  4. Syntax of DISTINCT
  5. How DISTINCT Works Internally
  6. DISTINCT on a Single Column
  7. DISTINCT on Multiple Columns
  8. DISTINCT vs GROUP BY
  9. Using DISTINCT with COUNT()
  10. Combining DISTINCT with ORDER BY
  11. Filtering Results After DISTINCT
  12. Case Sensitivity in DISTINCT
  13. Null Values and DISTINCT
  14. Performance Considerations
  15. Real-World Example: Removing Duplicate Emails
  16. Real-World Example: Unique Customer Locations
  17. Common Mistakes to Avoid
  18. Alternatives to DISTINCT
  19. Best Practices for Removing Duplicates
  20. Summary and What’s Next

1. Introduction

When retrieving data from databases, you often encounter duplicate rows — especially when joining tables or aggregating data. SQL’s DISTINCT keyword is a simple yet powerful tool to eliminate duplicates and return only unique values.


2. What is the DISTINCT Keyword?

The DISTINCT keyword in SQL ensures that the result set returns only unique rows by eliminating all duplicates based on the specified columns.


3. Why and When to Use DISTINCT

Use DISTINCT when:

  • You want to remove repeated rows
  • You need a list of unique values from one or more columns
  • You’re avoiding redundancy in dropdowns, reports, or exports
  • You’ve performed a join or union that introduced duplicates

4. Syntax of DISTINCT

sqlCopyEditSELECT DISTINCT column1, column2, ...
FROM table_name;

You can apply it to one or multiple columns.


5. How DISTINCT Works Internally

Under the hood, SQL compares rows after projection (SELECT clause) and removes duplicate rows. The remaining result set is sorted or hashed to identify and eliminate redundancy.


6. DISTINCT on a Single Column

sqlCopyEditSELECT DISTINCT country
FROM customers;

Returns a list of unique countries from the customers table.


7. DISTINCT on Multiple Columns

sqlCopyEditSELECT DISTINCT city, state
FROM customers;

Returns unique city + state combinations.
Rows with the same city but different states are not treated as duplicates.


8. DISTINCT vs GROUP BY

FeatureDISTINCTGROUP BY
PurposeRemove duplicate rowsGroup rows for aggregation
AggregationNot requiredTypically used with aggregation functions
OutputOnly unique rowsOne row per group

Example:

sqlCopyEdit-- DISTINCT
SELECT DISTINCT department FROM employees;

-- GROUP BY
SELECT department FROM employees GROUP BY department;

Both yield similar output — but use GROUP BY for summaries, not deduplication.


9. Using DISTINCT with COUNT()

sqlCopyEditSELECT COUNT(DISTINCT department) FROM employees;

Returns number of unique departments.


10. Combining DISTINCT with ORDER BY

sqlCopyEditSELECT DISTINCT name FROM customers
ORDER BY name;

You can sort results after deduplication.
Note: ORDER BY is applied after DISTINCT.


11. Filtering Results After DISTINCT

You can use WHERE to filter rows before applying DISTINCT:

sqlCopyEditSELECT DISTINCT country FROM customers
WHERE active = 1;

Filters active customers first, then selects distinct countries.


12. Case Sensitivity in DISTINCT

  • In PostgreSQL, DISTINCT is case-sensitive: 'India''india'
  • In MySQL, depends on collation (utf8_general_ci is case-insensitive)

Always be aware of your DBMS collation and character set.


13. Null Values and DISTINCT

  • NULL is treated as a distinct value
  • Multiple NULLs are considered duplicates for deduplication
sqlCopyEditSELECT DISTINCT department FROM employees;

If 3 rows have NULL department, only one NULL is returned.


14. Performance Considerations

  • DISTINCT can be expensive on large datasets
  • Consider indexing columns involved
  • Avoid SELECT DISTINCT * — it scans all columns and rows
  • Use GROUP BY with aggregates if more appropriate

15. Real-World Example: Removing Duplicate Emails

sqlCopyEditSELECT DISTINCT email FROM users;

Removes any repeated email addresses.


16. Real-World Example: Unique Customer Locations

sqlCopyEditSELECT DISTINCT city, state
FROM customers
WHERE country = 'India';

Returns all unique city+state combinations for Indian customers.


17. Common Mistakes to Avoid

MistakeWhy It’s a Problem
DISTINCT with *Expensive, returns unique rows based on all columns
Expecting column-wise deduplicationDISTINCT works row-wise, not per column
Not using aliases in complex queriesCan lead to confusion and redundancy

18. Alternatives to DISTINCT

  • Use GROUP BY if summarizing
  • Use ROW_NUMBER() or RANK() for first/last row per group
  • Use EXISTS or IN in subqueries for presence checks
  • Create temporary tables or views for deduped sets

19. Best Practices for Removing Duplicates

  • Be specific with column selection
  • Apply filters (WHERE) before using DISTINCT
  • Add ORDER BY if result sequence matters
  • Use COUNT(DISTINCT column) for summaries
  • Benchmark performance on large result sets

20. Summary and What’s Next

The DISTINCT keyword in SQL is a simple yet powerful tool to eliminate duplicates from your result set. Whether used on a single column or multiple, it ensures that your output is clean, lean, and logically precise.