By John Doe April 8, 2023
GROUP BY and
PARTITION BY clauses are both used in SQL to group data and calculate aggregate values, then what’s the difference between them?
The GROUP BY and PARTITION BY clauses are used in SQL to group data and calculate aggregate values. The
GROUP BY clause divides the result set into groups and returns one row per group, while the
PARTITION BY clause divides the result set into partitions and returns aggregated columns with each record in the specified table. The
GROUP BY clause is often used with aggregate functions like
AVG() to calculate sums or averages for each group. The
PARTITION BY clause does not reduce the number of rows returned. The
PARTITION BY clause gives more flexibility in choosing the grouping columns and allows for different columns in each partition.
Example of GROUP BY
Here is an example that shows how to compare the average salaries of employees in each department:
SELECT depname, avg(salary) FROM empsalary GROUP BY depname;
depname | avg -----------+----------------------- develop | 5020.0000000000000000 personnel | 3700.0000000000000000 sales | 4866.6666666666666667 (3 rows)
The first output column comes directly from the table
empsalary, and the column should be specified as a column in
GROUP BY clause. The second column represents the average salary value in the group.
Example of PARTITION BY
Here is an example that shows how to compare each employee’s salary with the average salary in his or her department:
SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary;
depname | empno | salary | avg -----------+-------+--------+----------------------- develop | 11 | 5200 | 5020.0000000000000000 develop | 7 | 4200 | 5020.0000000000000000 develop | 9 | 4500 | 5020.0000000000000000 develop | 8 | 6000 | 5020.0000000000000000 develop | 10 | 5200 | 5020.0000000000000000 personnel | 5 | 3500 | 3700.0000000000000000 personnel | 2 | 3900 | 3700.0000000000000000 sales | 3 | 4800 | 4866.6666666666666667 sales | 1 | 5000 | 4866.6666666666666667 sales | 4 | 4800 | 4866.6666666666666667 (10 rows)
The first three output columns come directly from the table
empsalary, and there is one output row for each row in the table. The fourth column represents an average taken across all the table rows that have the same
depname value as the current row. (This actually is the same function as the non-window
avg aggregate, but the
OVER clause causes it to be treated as a window function and computed across the window frame.)
Although we use a
GROUP BY most of the time, there are numerous cases when a
PARTITION BY would be a better choice. In some cases, you could use a
GROUP BY using subqueries to simulate a
PARTITION BY, but these can end up with very complex queries.
Let’s wrap everything up with the most important similarities and differences:
- Similarity: Both are used to return aggregated values.
- Difference: Using a
GROUP BYclause collapses original rows, then you cannot access the original values later in the query. On the other hand, using a
PARTITION BYclause keeps original values while also allowing us to produce aggregated values.
- Difference: The
PARTITION BYis combined with
OVER()and windows functions to add a lot more functionalities.