PostgreSQL Tutorial: Indexing Partitioned Table

January 21, 2025

Summary: In this tutorial, you will learn how to index partitioned table in PostgreSQL.

Table of Contents

Preparing the data set

When you are looking for free data sets “The home of the U.S. Government’s open data” is a great source. It provides free data sets for research, development or just data you can play with for whatever reason. In this tutorial, we will use the Traffic violations data set. It contains about 2 million of rows and is a good starting point for diving into PostgreSQL partitioning.

The initial import goes into a table which only contains text columns and from there we can load our partitions later on. The structure of the table is:

create table traffic_violations
( seqid text
, date_of_stop text
, time_of_stop text
, agency text
, subagency text
, description text
, location text
, latitude text
, longitude text
, accident text
, belts text
, personal_injury text
, property_damage text
, fatal text
, commercial_license text
, hazmat text
, commercial_vehicle text
, alcohol text
, work_zone text
, search_conducted text
, search_disposition text
, search_outcome text
, search_reason text
, search_reason_for_stop text
, search_type text
, search_arrest_reason text
, state text
, vehicletype text
, year text
, make text
, model text
, color text
, violation_type text
, charge text
, article text
, contributed_to_accident text
, race text
, gender text
, driver_city text
, driver_state text
, dl_state text
, arrest_type text
, geolocation text
);

For loading the table the easiest way is to use PostgreSQL’s COPY command:

COPY traffic_violations FROM '/home/postgres/Traffic_Violations.csv'
  WITH ( format csv, header true, null 'null', delimiter ',');

So we have around 2 millions of rows with real data. As this table contains only text columns we will create a list partitioned table with data types that fit the data.

List partitioning

Instead of partitioning by a range (typically based on day, year, month) list partitioning is used to partition on an explicit list with key values that define the partitions.

The application might want to ask for all traffic violations that match a specific violation type:

select distinct violation_type from traffic_violations;
 violation_type
----------------
 Citation
 ESERO
 SERO
 Warning
(4 rows)

Partitioning by that would result in 4 partitions (plus one default partition for any other potential violation type in the future). The definition of the table and it’s partitions is pretty much the same as for range partitioning but here we go for list partitioning:

create table traffic_violations_p_list
( seqid text
, date_of_stop date not null
, time_of_stop time
, agency text
, subagency text
, description text 
, location text
, latitude numeric
, longitude numeric
, accident text
, belts boolean
, personal_injury boolean 
, property_damage boolean
, fatal boolean
, commercial_license boolean
, hazmat boolean
, commercial_vehicle boolean
, alcohol boolean
, work_zone boolean
, state text
, vehicletype text 
, year smallint
, make text
, model text
, color text
, violation_type text 
, charge text
, article text
, contributed_to_accident boolean
, race text
, gender text
, driver_city text
, driver_state text
, dl_state text
, arrest_type text
, geolocation point)
partition by list (violation_type);

Now we can create the partitions:

create table traffic_violations_p_list_warning
partition of traffic_violations_p_list
for values in ('Warning');

create table traffic_violations_p_list_sero
partition of traffic_violations_p_list
for values in ('SERO');

create table traffic_violations_p_list_Citation
partition of traffic_violations_p_list
for values in ('Citation');

create table traffic_violations_p_list_ESERO
partition of traffic_violations_p_list
for values in ('ESERO');

create table traffic_violations_p_list_default
partition of traffic_violations_p_list DEFAULT;

We now have a partition for each violation type. As our partitioned table setup is now complete we can load the data:

insert into traffic_violations_p_list
select seqid
     , to_date(date_of_stop,'MM/DD/YYYY')
     , time_of_stop::time
     , agency
     , subagency
     , description
     , location
     , latitude::numeric
     , longitude::numeric
     , accident
     , belts::boolean
     , personal_injury::boolean
     , property_damage::boolean
     , fatal::boolean
     , commercial_license::boolean
     , hazmat::boolean
     , commercial_vehicle::boolean
     , alcohol::boolean
     , work_zone::boolean
     , state
     , vehicletype
     , case year
         when '' then null
         else year::smallint
       end
     , make
     , model
     , color
     , violation_type
     , charge
     , article
     , contributed_to_accident::boolean
     , race
     , gender
     , driver_city
     , driver_state
     , dl_state
     , arrest_type
     , geolocation::point
  from traffic_violations;

Indexing

When declarative partitioning was introduced in PostgreSQL 10 there were quite some limitations. PostgreSQL partitioning feature improved quite much since then.

We will use the list partitioned table. There is not a single constraint or index and the same is true for the partitions. We can not create a primary key or unique index because there are duplicate rows in the partitioned table. We can, however, create a standard btree index:

CREATE INDEX i1 on traffic_violations_p_list ( model );

This is a so called partitioned index and you can verify that with:

select * from pg_partition_tree('i1');
                    relid                     | parentrelid | isleaf | level
----------------------------------------------+-------------+--------+-------
 i1                                           |             | f      |     0
 traffic_violations_p_list_citation_model_idx | i1          | t      |     1
 traffic_violations_p_list_esero_model_idx    | i1          | t      |     1
 traffic_violations_p_list_sero_model_idx     | i1          | t      |     1
 traffic_violations_p_list_warning_model_idx  | i1          | t      |     1
 traffic_violations_p_list_default_model_idx  | i1          | t      |     1
(6 rows)

Indeed the index cascaded down to all the partitions. As soon as you add another partition it will be indexed automatically:

create table traffic_violations_p_list_demo
partition of traffic_violations_p_list
for values in ('demo');

You can as well create an index on a specific partition only (maybe because you know that the application is searching on a specific column on that partition):

CREATE INDEX i2 on traffic_violations_p_list_citation (make);

What is not working right now, is creating a partitioned index concurrently:

CREATE INDEX CONCURRENTLY i_con on traffic_violations_p_list (zip_codes);
ERROR:  cannot create index on partitioned table "traffic_violations_p_list" concurrently

This implies that there will be locking when you create a partitioned index in one session:

CREATE INDEX i_mun on traffic_violations_p_list (municipalities);

… and at the same time insert something in another session it will block until the index got created in the first session:

insert into traffic_violations_p_list ( seqid, date_of_stop ) values ( 'xxxxx', date('01.01.2023'));
-- blocks until index above is created

You can limit locking time when you create the partitioned index on the partitioned table only but do not cascade down to the partitions:

create index i_demo on only traffic_violations_p_list (accident);

This will leave the index in an invalid state:

select indisvalid from pg_index where indexrelid = 'i_demo'::regclass;
 indisvalid
------------
 f
(1 row)

Now you can create the index concurrently on all the partitions:

CREATE INDEX CONCURRENTLY i_demo_citation on traffic_violations_p_list_citation (accident);
CREATE INDEX CONCURRENTLY i_demo_demo on traffic_violations_p_list_demo (accident);
CREATE INDEX CONCURRENTLY i_demo_esero on traffic_violations_p_list_esero(accident);
CREATE INDEX CONCURRENTLY i_demo_sero on traffic_violations_p_list_sero(accident);
CREATE INDEX CONCURRENTLY i_demo_warning on traffic_violations_p_list_warning(accident);
CREATE INDEX CONCURRENTLY i_demo_default on traffic_violations_p_list_default(accident);

Once you have that you can attach all the indexes to the partitioned index:

ALTER INDEX i_demo ATTACH PARTITION i_demo_citation;
ALTER INDEX i_demo ATTACH PARTITION i_demo_demo;
ALTER INDEX i_demo ATTACH PARTITION i_demo_esero;
ALTER INDEX i_demo ATTACH PARTITION i_demo_sero;
ALTER INDEX i_demo ATTACH PARTITION i_demo_warning;
ALTER INDEX i_demo ATTACH PARTITION i_demo_default;

This makes the partitioned index valid automatically:

select indisvalid from pg_index where indexrelid = 'i_demo'::regclass;
 indisvalid 
------------
 t
(1 row)

See more

PostgreSQL Tutorial