PostgreSQL Tutorial: RANGE Clause with Window Functions

September 18, 2024

Summary: in this tutorial, you will learn some practical examples of leveraging the RANGE clause with window functions in PostgreSQL.

Table of Contents

RANGE Clause Syntax

When I’m talking about the RANGE clause, I’m talking about the one used in SQL window functions, which have the following syntax:

OVER (  
  [ <PARTITION BY clause> ]
  [ <ORDER BY clause> ]
  [ <ROW or RANGE clause> ]
)

When you look at the syntax above, you see that both ROW or RANGE can be part of the window function. Their syntax is as follows:

[<ROWS or RANGE clause> BETWEEN <Start expr> AND <End expr>]

The default window frame without the ORDER BY is the whole partition. But when you use the ORDER BY, the default window frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

Are the RANGE and the ROW Clauses the Same?

Nope, they’re not. They do, however, have the same purpose: to specify the starting and ending points within the partition, with the goal of limiting rows. However, each clause does it differently. The ROW clause does it by specifying a fixed number of rows that precede or follow the current row.

The RANGE clause, on the other hand, limits the rows logically; it specifies the range of values in relation to the value of the current row.

Which Data Types Can I Use in the RANGE clause?

You can use two data types with the RANGE clause: numeric and date/time types.

Let’s now go to the examples and see how RANGE works in practice! If you’re new to SQL window functions, maybe you should check what window functions are and how they compare to aggregate functions first.

Example 1 – Calculate the Running Total

The data I’ll be working with is in the table revenue. The columns are:

  • id – The date’s ID and the table’s primary key (PK).
  • date – The revenue’s date.
  • revenue_amount – The amount of the revenue.

Your task is to calculate running revenue totals using the RANGE clause. Let’s do it first without SQL. If you have the following data, what will be the running total (also known as a cumulative sum)?

id date revenue_amount running_total
1 2021-05-01 12,573.25 12,573.25
2 2021-05-02 11,348.22 23,921.47
3 2021-05-02 14,895.13 38,816.60
4 2021-05-03 14,388.14 53,204.74
5 2021-05-04 18,847.54 72,052.28

The running total for 2021-05-01 equals the revenue for that date: 12,573.25. That’s because there are no previous rows to include in the calculation. Then comes 2021-05-02. The running total is today’s revenue added to the previous day’s revenue: 11,348.22 + 12,573.25 = 23,921.47.

Notice there is another row with a different amount of revenue for 2021-05-02. Maybe this is for another branch, country, product or whatever. It works the same way: 14,895.13 + 23,921.47 = 38,816.60. (The RANGE clause will work even if there are multiple rows with the same date.) Next comes 2021-05-03. The running total for this date will be 14,388.14 + 38,816.60 = 53,204.74. Finally, the running total for 2021-05-04 will be 18,847.54 + 53,204 = 72,052.28.

How do you do the same using the RANGE clause? It could be done like this:

SELECT
  id,
  date,
  revenue_amount,
  SUM(revenue_amount) OVER (
    ORDER BY date
    RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS running_total
FROM revenue;

I’ve used the SUM() function on the column revenue_amount; this is the operation required to get the running total. For the SUM() function to become a window function, you need the OVER() clause. The window function calculation is done in ascending order; that’s because I want to make sure the revenue is being summed from the oldest to the newest date. Then comes the RANGE clause. It limits the window to the dates preceding the current date (BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) and the current date. Those are the rows that will be included in the running total calculation.

Here’s what you get when you run the code:

id date revenue_amount running_total
1 2021-05-01 12,573.25 12,573.25
2 2021-05-02 11,348.22 38,816.60
3 2021-05-02 14,895.13 38,816.60
4 2021-05-03 14,388.14 53,204.74
5 2021-05-04 18,847.54 72,052.28
6 2021-05-05 9,845.29 81,897.57
7 2021-05-06 14,574.56 96,472.13
8 2021-05-07 11,500.63 107,972.76
9 2021-05-08 16,897.21 124,869.97
10 2021-05-09 9,634.56 134,504.53
11 2021-05-10 14,255.87 148,760.40
12 2021-05-11 11,248.33 160,008.73
13 2021-05-12 21,489.22 181,497.95
14 2021-05-13 14,448.65 195,946.60
15 2021-05-14 15,517.22 211,463.82
16 2021-05-15 6,874.23 218,338.05
17 2021-05-16 12,500.00 230,838.05
18 2021-05-17 9,784.33 240,622.38
19 2021-05-18 15,321.89 255,944.27
20 2021-05-19 12,235.50 268,179.77
21 2021-05-20 22,222.22 290,401.99
21 2021-05-21 14,800.65 305,202.64
22 2021-05-22 5,894.12 311,096.76
23 2021-05-23 18,845.69 329,942.45
24 2021-05-24 9,966.66 339,909.11
25 2021-05-25 13,250.69 353,159.80
26 2021-05-26 4,987.56 358,147.36
27 2021-05-27 17,784.25 375,931.61
28 2021-05-28 12,567.45 388,499.06
29 2021-05-29 19,874.26 408,373.32
30 2021-05-30 15,489.36 423,862.68
31 2021-05-31 14,987.55 438,850.23

Notice that where there are multiple values for one date (2021-05-02) the code will include both rows in the running total calculation for that date. That’s why there’s 38,816.60 in the running_total column for that date.

Example 2 – Calculate the Moving Average

Let’s now see how you use the RANGE clause with date/time data types – and without the id column to help! We’ll use a slightly modified table, which is now called revenue_per_shop. It contains the revenue data for two shops. The columns are:

  • date – The revenue’s date.
  • shop – The shop’s name.
  • revenue_amount – The amount of revenue for that shop on that date.

First of all, let’s make sure you understand what a moving average is. A two-day moving average includes the current day and the previous day. Here’s some example data to show you how a moving average works:

date revenue_amount moving_avg
2021-05-01 12,573.25 12,573.25
2021-05-02 11,348.22 11,960.74
2021-05-03 14,388.14 12,868.18
2021-05-04 18,847.54 16,617.84

The two-day moving average for 2021-05-01 is the daily revenue itself: 12,573.25. That’s because there are no other rows to include in the calculation. The 2021-05-02 calculation includes two dates: (12,573.25 + 11,348.22)/2 = 11,960.74. The other rows follow the same two-date logic – the current date and the previous date.

So how do you calculate the same metric for every shop separately? Like this:

SELECT
  shop,
  date,
  revenue_amount,
  AVG(revenue_amount) OVER (
    PARTITION BY shop
    ORDER BY date ASC
    RANGE BETWEEN INTERVAL '1' DAY PRECEDING AND CURRENT ROW
  ) AS moving_avg
FROM revenue_per_shop;

The code first selects some columns from the table. Then comes the fun part. I’m using the AVG() function on the column revenue_amount because I want the average revenue. Again, this is a window function, so it must have the OVER() clause. I use the PARTITION BY to specify the column on which I want to aggregate data; it’s the column shop because I want the moving average for every shop separately. The operation is again ordered by date. In the RANGE clause, I merely specify which rows to include in the calculation. Since I’m working with dates, I’ll get the previous date by stating: BETWEEN INTERVAL '1' DAY PRECEDING. The output of the query is as follows:

shop date revenue_amount moving_avg
Shop 1 2021-05-01 12,573.25 12,573.25
Shop 1 2021-05-02 14,388.14 13,480.70
Shop 1 2021-05-03 9,845.29 12,116.72
Shop 1 2021-05-04 11,500.63 10,672.96
Shop 1 2021-05-05 9,634.56 10,567.60
Shop 1 2021-05-06 11,248.33 10,441.45
Shop 1 2021-05-07 14,448.65 12,848.49
Shop 1 2021-05-08 6,874.23 10,661.44
Shop 1 2021-05-09 9,784.33 8,329.28
Shop 1 2021-05-10 12,235.50 11,009.92
Shop 1 2021-05-11 14,800.65 13,518.08
Shop 1 2021-05-12 18,845.69 16,823.17
Shop 1 2021-05-13 13,250.69 16,048.19
Shop 1 2021-05-14 17,784.25 15,517.47
Shop 1 2021-05-15 19,874.26 18,829.26
Shop 2 2021-05-01 11,348.22 11,348.22
Shop 2 2021-05-02 18,847.54 15,097.88
Shop 2 2021-05-03 14,574.56 16,711.05
Shop 2 2021-05-04 16,897.21 15,735.89
Shop 2 2021-05-05 14,255.87 15,576.54
Shop 2 2021-05-06 21,489.22 17,872.55
Shop 2 2021-05-07 15,517.22 18,503.22
Shop 2 2021-05-08 12,500.00 14,008.61
Shop 2 2021-05-09 15,321.89 13,910.95
Shop 2 2021-05-10 22,222.22 18,772.06
Shop 2 2021-05-11 5,894.12 14,058.17
Shop 2 2021-05-12 9,966.66 7,930.39
Shop 2 2021-05-13 4,987.56 7,477.11
Shop 2 2021-05-14 12,567.45 8,777.51
Shop 2 2021-05-15 15,489.36 14,028.41

Example 3 – Find the Last Value Within a Range

This use of the RANGE clause allows you to find the last value within a defined range. For example, using the table revenue_by_shop, I can get the last value for every shop separately. The last value, in this case, means the last data available, which is the revenue for 2021-05-15. How do you get that data?

By using the RANGE clause, of course:

SELECT
  shop,
  date,
  revenue_amount,
  LAST_VALUE(revenue_amount) OVER (
    PARTITION BY shop
    ORDER BY date
    RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS last_value
FROM revenue_per_shop;

The window function I’ve used this time is LAST_VALUE(). Once again, I’m using it on the column revenue_amount. I’ve partitioned the data by shop, same as before. And I ordered it by date, again the same as before. To get the last value, I used RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. Remember, the default range with the ORDER BY clause is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. If you don’t change it, you’ll get the wrong result. The right result is:

shop date revenue_amount last_value
Shop 1 2021-05-01 12,573.25 19,874.26
Shop 1 2021-05-02 14,388.14 19,874.26
Shop 1 2021-05-03 9,845.29 19,874.26
Shop 1 2021-05-04 11,500.63 19,874.26
Shop 1 2021-05-05 9,634.56 19,874.26
Shop 1 2021-05-06 11,248.33 19,874.26
Shop 1 2021-05-07 14,448.65 19,874.26
Shop 1 2021-05-08 6,874.23 19,874.26
Shop 1 2021-05-09 9,784.33 19,874.26
Shop 1 2021-05-10 12,235.50 19,874.26
Shop 1 2021-05-11 14,800.65 19,874.26
Shop 1 2021-05-12 18,845.69 19,874.26
Shop 1 2021-05-13 13,250.69 19,874.26
Shop 1 2021-05-14 17,784.25 19,874.26
Shop 1 2021-05-15 19,874.26 19,874.26
Shop 2 2021-05-01 11,348.22 15,489.36
Shop 2 2021-05-02 18,847.54 15,489.36
Shop 2 2021-05-03 14,574.56 15,489.36
Shop 2 2021-05-04 16,897.21 15,489.36
Shop 2 2021-05-05 14,255.87 15,489.36
Shop 2 2021-05-06 21,489.22 15,489.36
Shop 2 2021-05-07 15,517.22 15,489.36
Shop 2 2021-05-08 12,500.00 15,489.36
Shop 2 2021-05-09 15,321.89 15,489.36
Shop 2 2021-05-10 22,222.22 15,489.36
Shop 2 2021-05-11 5,894.12 15,489.36
Shop 2 2021-05-12 9,966.66 15,489.36
Shop 2 2021-05-13 4,987.56 15,489.36
Shop 2 2021-05-14 12,567.45 15,489.36
Shop 2 2021-05-15 15,489.36 15,489.36

Example 4 – Find the Number of Items Within a Range

Here’s another fun and useful way to use the RANGE clause. How would you find the number of times the daily revenue for any shop was between 1,000 (dollars, euros…) below and above its current value?

This code might help:

SELECT
  shop,
  date,
  revenue_amount,
  COUNT(*) OVER (
    ORDER BY revenue_amount ASC
    RANGE BETWEEN 1000 PRECEDING AND 1000 FOLLOWING
  ) AS number_of_days
FROM revenue_per_shop;

I’m using the COUNT() window function. Since I’m not interested in separating the revenue by shops, there’s no PARTITION BY. The counting will be performed in ascending order according to the revenue amount. The range is defined by RANGE BETWEEN 1000 PRECEDING AND 1000 FOLLOWING.

Here’s what the code will return:

shop date revenue_amount number_of_days
Shop 2 2021-05-13 4,987.56 2
Shop 2 2021-05-11 5,894.12 3
Shop 1 2021-05-08 6,874.23 2
Shop 1 2021-05-05 9,634.56 4
Shop 1 2021-05-09 9,784.33 4
Shop 1 2021-05-03 9,845.29 4
Shop 2 2021-05-12 9,966.66 4
Shop 1 2021-05-06 11,248.33 4
Shop 2 2021-05-01 11,348.22 4
Shop 1 2021-05-04 11,500.63 5
Shop 1 2021-05-10 12,235.50 7
Shop 2 2021-05-08 12,500.00 6
Shop 2 2021-05-14 12,567.45 5
Shop 1 2021-05-01 12,573.25 5
Shop 1 2021-05-13 13,250.69 4
Shop 2 2021-05-05 14,255.87 5
Shop 1 2021-05-02 14,388.14 6
Shop 1 2021-05-07 14,448.65 6
Shop 2 2021-05-03 14,574.56 8
Shop 1 2021-05-11 14,800.65 8
Shop 2 2021-05-09 15,321.89 7
Shop 2 2021-05-15 15,489.36 5
Shop 2 2021-05-07 15,517.22 5
Shop 2 2021-05-04 16,897.21 2
Shop 1 2021-05-14 17,784.25 2
Shop 1 2021-05-12 18,845.69 2
Shop 2 2021-05-02 18,847.54 2
Shop 1 2021-05-15 19,874.26 1
Shop 2 2021-05-06 21,489.22 2
Shop 2 2021-05-10 22,222.22 2

Let me explain what this result tells you. If you take the first row, the result in the column number_of_days is 2. There are two instances where the revenue is between 3,987.56 and 5,987.56. Why this range? The revenue for 2021-05-13 is 4,987.56. So 4,987.56 - 1,000 = 3,987.56 and 4,987.56 + 1,000 = 5,987.56. Do you want to check the result? Which two instances are between this range? Obviously, the first two:

shop date revenue_amount number_of_days
Shop 2 2021-05-13 4,987.56 2
Shop 2 2021-05-11 5,894.12 3

Do you want to check the second line? It says there are three instances between 4,894.12 and 6,894.12 – these three:

shop date revenue_amount number_of_days
Shop 2 2021-05-13 4,987.56 2
Shop 2 2021-05-11 5,894.12 3
Shop 1 2021-05-08 6,874.23 2

The same logic applies to the rest of the data.

Example 5 – Find the Maximum Value

This is not as simple as it may seem; I’m not talking about the common maximum value. To find that, you wouldn’t need the RANGE clause. But how about finding the maximum value (or revenue, in this case) across five days? Those five days will include the current date, up to three days before that, and one day after the current date. You probably already know the logic after all these examples of RANGE usage. Here’s my solution:

SELECT
  shop,
  date,
  revenue_amount,
  MAX(revenue_amount) OVER (
    ORDER BY DATE
    RANGE BETWEEN INTERVAL '3' DAY PRECEDING AND INTERVAL '1' DAY FOLLOWING
  ) AS max_revenue
FROM revenue_per_shop;

I’m using the MAX() function as a window function. Yet again, I’m using it with the column revenue_amount. There is no PARTITION BY in the OVER() clause because I’m not interested in separating data on any level. Defining the range is not that difficult: RANGE BETWEEN INTERVAL '3' DAY PRECEDING AND INTERVAL '1' DAY FOLLOWING. This will include the current date, three days before it, and one day after it. Here’s the result:

shop date revenue_amount max_revenue
Shop 1 2021-05-01 12,573.25 18,847.54
Shop 2 2021-05-01 11,348.22 18,847.54
Shop 1 2021-05-02 14,388.14 18,847.54
Shop 2 2021-05-02 18,847.54 18,847.54
Shop 1 2021-05-03 9,845.29 18,847.54
Shop 2 2021-05-03 14,574.56 18,847.54
Shop 1 2021-05-04 11,500.63 18,847.54
Shop 2 2021-05-04 16,897.21 18,847.54
Shop 1 2021-05-05 9,634.56 21,489.22
Shop 2 2021-05-05 14,255.87 21,489.22
Shop 1 2021-05-06 11,248.33 21,489.22
Shop 2 2021-05-06 21,489.22 21,489.22
Shop 2 2021-05-07 15,517.22 21,489.22
Shop 1 2021-05-07 14,448.65 21,489.22
Shop 2 2021-05-08 12,500.00 21,489.22
Shop 1 2021-05-08 6,874.23 21,489.22
Shop 2 2021-05-09 15,321.89 22,222.22
Shop 1 2021-05-09 9,784.33 22,222.22
Shop 1 2021-05-10 12,235.50 22,222.22
Shop 2 2021-05-10 22,222.22 22,222.22
Shop 1 2021-05-11 14,800.65 22,222.22
Shop 2 2021-05-11 5,894.12 22,222.22
Shop 2 2021-05-12 9,966.66 22,222.22
Shop 1 2021-05-12 18,845.69 22,222.22
Shop 1 2021-05-13 13,250.69 22,222.22
Shop 2 2021-05-13 4,987.56 22,222.22
Shop 1 2021-05-14 17,784.25 19,874.26
Shop 2 2021-05-14 12,567.45 19,874.26
Shop 2 2021-05-15 15,489.36 19,874.26
Shop 1 2021-05-15 19,874.26 19,874.26

Let’s check the result for 2021-05-05. Its range is from 2021-05-02 to 2021-05-06 by date. To get the maximum revenue in that range, SQL will compare values: 14,388.14, 18,847.54, 9,845.29, 14,574.56, 11,500.63, 16,897.21, 9,634.56, 14,255.87, 11,248.33, 21,489.22. Which one is the highest? It’s 21,489.22.