PostgreSQL Tutorial: Check which indexes to rebuild

January 23, 2024

Summary: in this tutorial, you will learn how to check which indexes to rebuild in PostgreSQL.


While it may be rare to rebuild an Btree index for performance reasons, there are some databases that will get a measurable performance boost from rebuilding indexes. These workloads have these characteristics:

  • High index fragmentation: The SQL workload has lots of table DML causing lots of deleted leaf blocks.
  • High index scan access plans: The SQL workload is rich with index scans (index range scans and index only scans).

The rules for identification of candidates for index coalescing/rebuilding depend on your specific index state.

Which indexes are bad?

You cannot generalize to say that index rebuilding for performance is rare, or even medium rare, it depends on many factors, most importantly the characteristics of the application.

  • In scientific applications (clinical, laboratory) where large datasets are added and removed, the need to rebuild indexes is “common”.
  • Conversely, in system that never update or delete rows, index rebuilding rarely improves performance.
  • In systems that do batch DML jobs, index rebuilding “often” improves SQL performance.

Generally, the indexes need to rebuild when these conditions are true:

  • The percentage of bloat in the index exceeds 50%.
  • the index depth is more than 4 levels.

When an index becomes skewed, parts of an index are accessed more frequently than others. As a result of this skew, disk contention may occur, creating a bottleneck in SQL performance. Hence, it is important to periodically examine your indexes to determine if they have become skewed and might need to be rebuilt.

Remember, Btree index nodes are not physically deleted when table rows are deleted, nor are the entries removed from the index. Rather, PostgreSQL “logically” deletes the index entry and leaves “dead” nodes in the index tree where that may be reused if another adjacent entry is required.

However, when large numbers of adjacent rows are deleted, it is highly unlikely that PostgreSQL will have an opportunity to reuse the deleted leaf rows, and these represent wasted space in the index. In addition to wasting space, large volumes of deleted leaf nodes will make index only scans run for longer periods.

Btree index height

The height of the index refers to the number of levels that are spawned by the index as a result in row inserts. When a large amount of rows are added to a table, PostgreSQL may spawn additional levels of an index to accommodate the new rows.

Btree indexes can support many millions of entries in three levels. Any Btree index that has spawned to a 4th level followed by a large delete job might benefit from rebuilding to restore the index to it’s pristine state.

The pageinspect module provides functions that allow you to inspect the contents of database pages at a low level, which is useful for obtaining some internal information. All of these functions may be used only by superusers.

bt_metap() will point you to the root node, and bt_page_stats() for the root node will show you the level btpo_level. By using bt_page_items() on a couple of index pages, you can see how many children each has.

bt_metap returns information about a B-tree index’s metapage. For example:

test=# SELECT * FROM bt_metap('large_table_index');
-[ RECORD 1 ]-------------+-------
magic                     | 340322
version                   | 4
root                      | 1
level                     | 0
fastroot                  | 1
fastlevel                 | 0
last_cleanup_num_delpages | 0
last_cleanup_num_tuples   | 230
allequalimage             | f

bt_page_stats returns summary information about single pages of B-tree indexes. For example:

test=# SELECT * FROM bt_page_stats('large_table_index', 1);
-[ RECORD 1 ]-+-----
blkno         | 1
type          | l
live_items    | 224
dead_items    | 0
avg_item_size | 16
page_size     | 8192
free_size     | 3668
btpo_prev     | 0
btpo_next     | 0
btpo_level    | 0
btpo_flags    | 3

See more

PostgreSQL Monitoring

comments powered by Disqus