# Difference Between GROUP BY and PARTITION BY

By John Doe April 8, 2023

Summary: The `GROUP BY` and `PARTITION BY` clauses are both used in SQL to group data and calculate aggregate values, then what’s the difference between them?

## Introduction

The GROUP BY and PARTITION BY clauses are used in SQL to group data and calculate aggregate values. The `GROUP BY` clause divides the result set into groups and returns one row per group, while the `PARTITION BY` clause divides the result set into partitions and returns aggregated columns with each record in the specified table. The `GROUP BY` clause is often used with aggregate functions like `SUM()` and `AVG()` to calculate sums or averages for each group. The `PARTITION BY` clause does not reduce the number of rows returned. The `PARTITION BY` clause gives more flexibility in choosing the grouping columns and allows for different columns in each partition.

## Example of GROUP BY

Here is an example that shows how to compare the average salaries of employees in each department:

``````SELECT depname, avg(salary) FROM empsalary GROUP BY depname;
``````
``````  depname  |          avg
-----------+-----------------------
develop   | 5020.0000000000000000
personnel | 3700.0000000000000000
sales     | 4866.6666666666666667
(3 rows)
``````

The first output column comes directly from the table `empsalary`, and the column should be specified as a column in `GROUP BY` clause. The second column represents the average salary value in the group.

## Example of PARTITION BY

Here is an example that shows how to compare each employee’s salary with the average salary in his or her department:

``````SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary;
``````
``````  depname  | empno | salary |          avg
-----------+-------+--------+-----------------------
develop   |    11 |   5200 | 5020.0000000000000000
develop   |     7 |   4200 | 5020.0000000000000000
develop   |     9 |   4500 | 5020.0000000000000000
develop   |     8 |   6000 | 5020.0000000000000000
develop   |    10 |   5200 | 5020.0000000000000000
personnel |     5 |   3500 | 3700.0000000000000000
personnel |     2 |   3900 | 3700.0000000000000000
sales     |     3 |   4800 | 4866.6666666666666667
sales     |     1 |   5000 | 4866.6666666666666667
sales     |     4 |   4800 | 4866.6666666666666667
(10 rows)
``````

The first three output columns come directly from the table `empsalary`, and there is one output row for each row in the table. The fourth column represents an average taken across all the table rows that have the same `depname` value as the current row. (This actually is the same function as the non-window `avg` aggregate, but the `OVER` clause causes it to be treated as a window function and computed across the window frame.)

## Summary

Although we use a `GROUP BY` most of the time, there are numerous cases when a `PARTITION BY` would be a better choice. In some cases, you could use a `GROUP BY` using subqueries to simulate a `PARTITION BY`, but these can end up with very complex queries.

Let’s wrap everything up with the most important similarities and differences:

• Similarity: Both are used to return aggregated values.
• Difference: Using a `GROUP BY` clause collapses original rows, then you cannot access the original values later in the query. On the other hand, using a `PARTITION BY` clause keeps original values while also allowing us to produce aggregated values.
• Difference: The `PARTITION BY` is combined with `OVER()` and windows functions to add a lot more functionalities.