By John Doe April 8, 2023
Summary: The GROUP BY
and PARTITION BY
clauses are both used in SQL to group data and calculate aggregate values, then what’s the difference between them?
Table of Contents
Introduction
The GROUP BY and PARTITION BY clauses are used in SQL to group data and calculate aggregate values. The GROUP BY
clause divides the result set into groups and returns one row per group, while the PARTITION BY
clause divides the result set into partitions and returns aggregated columns with each record in the specified table. The GROUP BY
clause is often used with aggregate functions like SUM()
and AVG()
to calculate sums or averages for each group. The PARTITION BY
clause does not reduce the number of rows returned. The PARTITION BY
clause gives more flexibility in choosing the grouping columns and allows for different columns in each partition.
Example of GROUP BY
Here is an example that shows how to compare the average salaries of employees in each department:
SELECT depname, avg(salary) FROM empsalary GROUP BY depname;
depname | avg
-----------+-----------------------
develop | 5020.0000000000000000
personnel | 3700.0000000000000000
sales | 4866.6666666666666667
(3 rows)
The first output column comes directly from the table empsalary
, and the column should be specified as a column in GROUP BY
clause. The second column represents the average salary value in the group.
Example of PARTITION BY
Here is an example that shows how to compare each employee’s salary with the average salary in his or her department:
SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary;
depname | empno | salary | avg
-----------+-------+--------+-----------------------
develop | 11 | 5200 | 5020.0000000000000000
develop | 7 | 4200 | 5020.0000000000000000
develop | 9 | 4500 | 5020.0000000000000000
develop | 8 | 6000 | 5020.0000000000000000
develop | 10 | 5200 | 5020.0000000000000000
personnel | 5 | 3500 | 3700.0000000000000000
personnel | 2 | 3900 | 3700.0000000000000000
sales | 3 | 4800 | 4866.6666666666666667
sales | 1 | 5000 | 4866.6666666666666667
sales | 4 | 4800 | 4866.6666666666666667
(10 rows)
The first three output columns come directly from the table empsalary
, and there is one output row for each row in the table. The fourth column represents an average taken across all the table rows that have the same depname
value as the current row. (This actually is the same function as the non-window avg
aggregate, but the OVER
clause causes it to be treated as a window function and computed across the window frame.)
Summary
Although we use a GROUP BY
most of the time, there are numerous cases when a PARTITION BY
would be a better choice. In some cases, you could use a GROUP BY
using subqueries to simulate a PARTITION BY
, but these can end up with very complex queries.
Let’s wrap everything up with the most important similarities and differences:
- Similarity: Both are used to return aggregated values.
- Difference: Using a
GROUP BY
clause collapses original rows, then you cannot access the original values later in the query. On the other hand, using aPARTITION BY
clause keeps original values while also allowing us to produce aggregated values. - Difference: The
PARTITION BY
is combined withOVER()
and windows functions to add a lot more functionalities.