PostgreSQL Tutorial: Paginated SELECT

March 25, 2024

Summary: in this tutorial, you will learn how to paginate the result set returned from the SELECTstatement in PostgreSQL.

Background

The basics of pagination are covered in detail elsewhere, but offset pagination like the following has a performance cost proportional to the size of the offset, which can be considerable for large offsets. This is because PostgreSQL has to construct the rowset up to and including the desired page which can be large.

select * from transactions
order by created_at desc
limit <page_size> offset 100

One way around this is to use keyset pagination; instead of indexing pages based on an offset, if the rowset is ordered on a unique column we can use values from that column to index a page. As long as there’s a sortable index on the column in the rowset, given a value from the column we can access the corresponding page in constant time.

We do lose out on the ability to jump to page p of the results, but for most applications this isn’t an issue.

Keyset pagination

Keyset pagination works by generating a total ordering over the table by sorting over the column(s) of interest and a unique column. In PostgreSQL the order by statements isn’t stable and this means that we need to guarantee a total ordering the result set to ensure users see a stable ordering. We do this by including a unique column as the lowest-priority column in all order by statements. The primary key makes a good choice here.

To fetch a page of results we do something like:

select * from transactions
    where created_at, id >= '<cursor_created_at>', '<cursor_id>'
order by created_at, id
limit <page_size>

So the new question arises, how to retrieve cursors for the next and previous pages (in order to power the forward and back buttons on the UI)? and how to implemente multi-column sorting?

There are several points need to consider:

1. Can I quickly retrieve the current page, the cursor position for the next page and the cursor position for the previous page, all in one query?

2. Is it easy to extend the query to support arbitrary sorting and filtering?

The basic idea is to fetch the current page and cursor in one CTE (which are adjacent in the rowset), the previous page cursor in another CTE, union the results, label them accordingly (previous_cursor, current_page, next_cursor) and return them, all as part of one query.

The first step is to filter the data in the table. We do not materialize the filtered_rows expression which forces the expression to be merged with dependent queries (see the documentation for more on this). In doing so we incur the additional cost of re-calculating the filtered_rows query for every dependent query but gain the ability to make use of the indicies on the table. This greatly speeds up the coming operations.

To fetch the cursor we select the id column along with any columns involved in the sort. In this example we’re ordering on created_at but it’s trivial to sort on multiple columns if required. Note that we sort by the id column as lowest priority.

cursor as
    (select created_at, id from rows where id = ? order by created_at, id limit 1)

Fetching the current page and cursor for the next page is straightforward. The key thing to note is the where clause that selects only the rows that succeed the cursor row in the total ordering and the order and limit clauses which combine to select only the rows immediately subsequent to the cursor, forming the page and the next cursor:

current_page_and_next_cursor as
    (select *
        from rows
    where (created_at, id) >= (select * from cursor)
    order by created_at, id
    limit <page_size + 1>
)

Fetching the previous page is similarly straightforward. The main things to notice are the reversal of the equality condition in the where clause (as we want to select the rows that precede the cursor) and the reversal of the order by clause to ensure that we select the n rows that precede the cursor (instead of the first n rows in the result set). We’ll select just the first row from this set when we combine the results to yield the cursor for the previous page.

previous_page as
    (select *
        from rows
    where (created_at, id) < (select * from cursor)
    order by created_at desc, id desc
    limit <page_size>
),

With the relevant CTEs constructed the final task is to union the rows together and label them:

(select *, 'previous_cursor' as label from previous_page order by created_at, id limit 1)
union all
(select *, 'current_page' as label from current_page_and_next_cursor order by created_at, id limit <page_size>)
union all
(select *, 'next_cursor' as label from current_page_and_next_cursor order by created_at, id limit 1 offset <page_size>);

A few things to mention here. Firstly the use of union all to preserve the ordering of the result set. Secondly I’m again using offsets and limits to select the single row from the previous_page and current_page_and_next_cursor CTEs that represents the previous page cursor and next page cursor respectively. Finally a label is added to identify the relevant rows to the application. Cursors are not returned in cases where either the previous page doesn’t exist (i.e. when the cursor is pointing to the first row in the result set) or the next page doesn’t exist (i.e. when the cursor is pointing at a row within pageSize distance of the last row).

Summary

The Limit-Offset pagination has two big problems, result inconsistency and offset inefficiency. Consistency refers to the intention that traversing a resultset should retrieve every item exactly once, without omissions or duplication. Offset inefficiency refers to the delay incurred by shifting the results by a large offset.

Keyset pagination is fast, and it is consistent too. Any insertions/deletions before the current page will leave the results unaffected. The two downsides to this method are lack of random access and possible coupling between client and server.

comments powered by Disqus