PostgreSQL Tutorial: Multiple Indexes vs. Multicolumn Indexes

January 16, 2025

Summary: in this tutorial, we will discuss that multiple indexes vs. multicolumn indexes in PostgreSQL.

Table of Contents

Multiple indexes and multicolumn indexes

It is one of the most common question about indexing: is it better to create one index for each column or a single index for all columns of a WHERE clause? The answer is very simple in most cases: one index with multiple columns is better—that is, a multicolumn or compound index. Multicolumn Indexes explains them in detail.

Nevertheless there are queries where a single index cannot do a perfect job, no matter how you define the index; e.g., queries with two or more independent range conditions as in the following example:

SELECT first_name, last_name, date_of_birth
  FROM employees
 WHERE UPPER(last_name) < ?
   AND date_of_birth    < ?

It is impossible to define a B-tree index that would support this query without filter predicates. For an explanation, you just need to remember that an index is a linked list.

If you define the index as UPPER(LAST_NAME), DATE_OF_BIRTH (in that order), the list begins with A and ends with Z. The date of birth is considered only when there are two employees with the same name. If you define the index the other way around, it will start with the eldest employees and end with the youngest. In that case, the names only have a minor impact on the sort order.

No matter how you twist and turn the index definition, the entries are always arranged along a chain. At one end, you have the small entries and at the other end the big ones. An index can therefore only support one range condition as an access predicate. Supporting two independent range conditions requires a second axis, for example like a chessboard. The query above would then match all entries from one corner of the chessboard, but an index is not like a chessboard — it is like a chain. There is no corner.

You can of course accept the filter predicate and use a multi-column index nevertheless. That is the best solution in many cases anyway. The index definition should then mention the more selective column first so it can be used with an access predicate. Every time a compound index is created, the order of the columns must be chosen wisely. However, there is the myth that you should always put the most selective column to the first position; that is just wrong.

Important: The most important consideration when defining a concatenated index is how to choose the column order so it can be used as often as possible.

The other option is to use two separate indexes, one for each column. Then the database must scan both indexes first and then combine the results. The duplicate index lookup alone already involves more effort because the database has to traverse two index trees. Additionally, the database needs a lot of memory and CPU time to combine the intermediate results.

Note: One index scan is faster than two.

Combining multiple indexes

PostgreSQL has the ability to combine multiple indexes to handle cases that cannot be implemented by single index scans. Combining Multiple Indexes explains the related algorithms in detail.

In the world of a data warehouse, there will be many unpredictable ad-hoc queries. It just needs a few clicks to combine arbitrary conditions into the query of your choice. It is impossible to predict the column combinations that might appear in the WHERE clause and that makes indexing almost impossible.

The advantage of multiple indexes is that they can be combined rather easily. That means you get decent performance when indexing each column individually. Conversely if you know the query in advance, so that you can create a tailored multi-column B-tree index, it will still be faster than combining multiple indexes.

In the absence of a better access path, PostgreSQL convert the results of several B-tree scans into in-memory bitmap structures. Those can be combined efficiently. The bitmap structures are not stored persistently but discarded after statement execution, thus bypassing the problem of the poor write scalability. The downside is that it needs a lot of memory and CPU time. This method is, after all, an optimizer’s act of desperation.

See more

PostgreSQL Optimization