August 3, 2023
Summary: in this tutorial, you will learn how to use the PostgreSQL
SELECT DISTINCT clause to remove duplicate rows from a result set returned by a query.
Introduction to PostgreSQL
SELECT DISTINCT clause
DISTINCT clause is used in the
SELECT statement to remove duplicate rows from a result set. The
DISTINCT clause keeps one row for each group of duplicates. The
DISTINCTclause can be applied to one or more columns in the select list of the
The following illustrates the syntax of the
SELECT DISTINCT column1 FROM table_name;
In this statement, the values in the
column1 column are used to evaluate the duplicate.
If you specify multiple columns, the
DISTINCT clause will evaluate the duplicate based on the combination of values of these columns.
SELECT DISTINCT column1, column2 FROM table_name;
In this case, the combination of values in both
column2 columns will be used for evaluating the duplicate.
PostgreSQL also provides the
DISTINCT ON (expression) to keep the “first” row of each group of duplicates using the following syntax:
SELECT DISTINCT ON (column1) column_alias, column2 FROM table_name ORDER BY column1, column2;
The order of rows returned from the
SELECT statement is unspecified therefore the “first” row of each group of the duplicate is also unspecified.
It is a good practice to always use the
ORDER BY clause with the
DISTINCT ON(expression) to make the result set predictable.
Notice that the
DISTINCT ON expression must match the leftmost expression in the
ORDER BY clause.
SELECT DISTINCT examples
Note that you will learn how to create a table and insert data into a table in the subsequent tutorial. In this tutorial, you just execute the statements in psql or pgAdmin.
First, use the following
CREATE TABLE statement to create the
distinct_demo table that consists of three columns:
CREATE TABLE distinct_demo ( id serial NOT NULL PRIMARY KEY, bcolor VARCHAR, fcolor VARCHAR );
Second, insert some rows into the
distinct_demo table using the following
INSERT INTO distinct_demo (bcolor, fcolor) VALUES ('red', 'red'), ('red', 'red'), ('red', NULL), (NULL, 'red'), ('red', 'green'), ('red', 'blue'), ('green', 'red'), ('green', 'blue'), ('green', 'green'), ('blue', 'red'), ('blue', 'green'), ('blue', 'blue');
Third, query the data from the
distinct_demo table using the
SELECT id, bcolor, fcolor FROM distinct_demo ;
DISTINCT one column example
SELECT DISTINCT bcolor FROM distinct_demo ORDER BY bcolor;
DISTINCT multiple columns
The following statement demonstrates how to use the
DISTINCT clause on multiple columns:
SELECT DISTINCT bcolor, fcolor FROM distinct_demo ORDER BY bcolor, fcolor;
Because we specified both
fcolor columns in the
SELECT DISTINCT clause, PostgreSQL combined the values in both
fcolor columns to evaluate the uniqueness of the rows.
The query returns the unique combination of
fcolor from the
distinct_demo table. Notice that the
distinct_demo table has two rows with
red value in both
fcolor columns. When we applied the
DISTINCT to both columns, one row was removed from the result set because it is the duplicate.
DISTINCT ON example
The following statement sorts the result set by the
fcolor, and then for each group of duplicates, it keeps the first row in the returned result set.
SELECT DISTINCT ON (bcolor) bcolor, fcolor FROM distinct_demo ORDER BY bcolor, fcolor;
Here is the output:
In this tutorial, you have learned how to use PostgreSQL
SELECT DISTINCT statement to remove duplicate rows returned by a query.