PostgreSQL Tutorial: SELECT DISTINCT

August 3, 2023

Summary: in this tutorial, you will learn how to use the PostgreSQL SELECT DISTINCT clause to remove duplicate rows from a result set returned by a query.

Table of Contents

Introduction to PostgreSQL SELECT DISTINCT clause

The DISTINCT clause is used in the SELECT statement to remove duplicate rows from a result set. The DISTINCT clause keeps one row for each group of duplicates. The DISTINCTclause can be applied to one or more columns in the select list of the SELECT statement.

The following illustrates the syntax of the DISTINCT clause:

SELECT
   DISTINCT column1
FROM
   table_name;

In this statement, the values in the column1 column are used to evaluate the duplicate.

If you specify multiple columns, the DISTINCT clause will evaluate the duplicate based on the combination of values of these columns.

SELECT
   DISTINCT column1, column2
FROM
   table_name;

In this case, the combination of values in both column1 and column2 columns will be used for evaluating the duplicate.

PostgreSQL also provides the DISTINCT ON (expression) to keep the “first” row of each group of duplicates using the following syntax:

SELECT
   DISTINCT ON (column1) column_alias,
   column2
FROM
   table_name
ORDER BY
   column1,
   column2;

The order of rows returned from the SELECT statement is unspecified therefore the “first” row of each group of the duplicate is also unspecified.

It is a good practice to always use the ORDER BY clause with the DISTINCT ON(expression) to make the result set predictable.

Notice that the DISTINCT ON expression must match the leftmost expression in the ORDER BY clause.

PostgreSQL SELECT DISTINCT examples

Let’s create a new table called distinct_demo and insert data into it for practicing the DISTINCT clause.

Note that you will learn how to create a table and insert data into a table in the subsequent tutorial. In this tutorial, you just execute the statements in psql or pgAdmin.

First, use the following CREATE TABLE statement to create the distinct_demo table that consists of three columns: id, bcolorand fcolor.

CREATE TABLE distinct_demo (
	id serial NOT NULL PRIMARY KEY,
	bcolor VARCHAR,
	fcolor VARCHAR
);

Second, insert some rows into the distinct_demo table using the following INSERT statement:

INSERT INTO distinct_demo (bcolor, fcolor)
VALUES
	('red', 'red'),
	('red', 'red'),
	('red', NULL),
	(NULL, 'red'),
	('red', 'green'),
	('red', 'blue'),
	('green', 'red'),
	('green', 'blue'),
	('green', 'green'),
	('blue', 'red'),
	('blue', 'green'),
	('blue', 'blue');

Third, query the data from the distinct_demo table using the SELECT statement:

SELECT
	id,
	bcolor,
	fcolor
FROM
	distinct_demo ;

PostgreSQL Distinct - Sample Table

PostgreSQL DISTINCT one column example

The following statement selects unique values in the bcolor column from the t1 table and sorts the result set in alphabetical order by using the ORDER BY clause.

SELECT
	DISTINCT bcolor
FROM
	distinct_demo
ORDER BY
	bcolor;

PostgreSQL Distinct - select distinct one column

PostgreSQL DISTINCT multiple columns

The following statement demonstrates how to use the DISTINCT clause on multiple columns:

SELECT
	DISTINCT bcolor,
	fcolor
FROM
	distinct_demo
ORDER BY
	bcolor,
	fcolor;

PostgreSQL Distinct - select distinct two columns

Because we specified both bcolor and fcolor columns in the SELECT DISTINCT clause, PostgreSQL combined the values in both bcolor and fcolor columns to evaluate the uniqueness of the rows.

The query returns the unique combination of bcolor and fcolor from the distinct_demo table. Notice that the distinct_demo table has two rows with red value in both bcolor and fcolor columns. When we applied the DISTINCT to both columns, one row was removed from the result set because it is the duplicate.

PostgreSQL DISTINCT ON example

The following statement sorts the result set by the bcolor and fcolor, and then for each group of duplicates, it keeps the first row in the returned result set.

SELECT
	DISTINCT ON (bcolor) bcolor,
	fcolor
FROM
	distinct_demo 
ORDER BY
	bcolor,
	fcolor;

Here is the output:

PostgreSQL Distinct - select distinct on

In this tutorial, you have learned how to use PostgreSQL SELECT DISTINCT statement to remove duplicate rows returned by a query.