PostgreSQL Tutorial: Sub-Partitioning

January 20, 2025

Summary: In this tutorial, you will learn how to use sub-partitioning in PostgreSQL.

Table of Contents

Preparing the data set

When you are looking for free data sets “The home of the U.S. Government’s open data” is a great source. It provides free data sets for research, development or just data you can play with for whatever reason. In this tutorial, we will use the Traffic violations data set. It contains about 2 million of rows and is a good starting point for diving into PostgreSQL partitioning.

The initial import goes into a table which only contains text columns and from there we can load our partitions later on. The structure of the table is:

create table traffic_violations
( seqid text
, date_of_stop text
, time_of_stop text
, agency text
, subagency text
, description text
, location text
, latitude text
, longitude text
, accident text
, belts text
, personal_injury text
, property_damage text
, fatal text
, commercial_license text
, hazmat text
, commercial_vehicle text
, alcohol text
, work_zone text
, search_conducted text
, search_disposition text
, search_outcome text
, search_reason text
, search_reason_for_stop text
, search_type text
, search_arrest_reason text
, state text
, vehicletype text
, year text
, make text
, model text
, color text
, violation_type text
, charge text
, article text
, contributed_to_accident text
, race text
, gender text
, driver_city text
, driver_state text
, dl_state text
, arrest_type text
, geolocation text
);

For loading the table the easiest way is to use PostgreSQL’s COPY command:

COPY traffic_violations FROM '/home/postgres/Traffic_Violations.csv'
  WITH ( format csv, header true, null 'null', delimiter ',');

So we have around 2 millions of rows with real data. As this table contains only text columns we will create a range partitioned table with data types that fit the data.

Range partitioning

Usually range partitioning is used to partition a table by days, months or years although you can partition by other data types as well. Time/date based range partitioning is probably the most common use case, so we will partition our traffic violation data by year, and then sub-partition a table partition by month.

Lets assume our most common queries against the data set restrict the data for one or more years. Using partitioned tables we can range partition on the “date_of_stop” column and break down the large data set into smaller chunks per year. How would that work in PostgreSQL?

Currently we have this min and max values for the “date_of_stop” column:

SELECT min(to_date(date_of_stop, 'MM/DD/YYYY')),
       max(to_date(date_of_stop, 'MM/DD/YYYY'))
  FROM traffic_violations;
    min     |    max
------------+------------
 2012-01-01 | 2025-01-16
(1 row)

If we want to partition that by year we need at least 14 partitions (2012 to 2025) and maybe already add a partition for 2026 so we are on the safe side when a more recent data set brings data for next year. The first step is to create a partitioned table:

create table traffic_violations_p
( seqid text
, date_of_stop date not null
, time_of_stop time
, agency text
, subagency text
, description text 
, location text
, latitude numeric
, longitude numeric
, accident text
, belts boolean
, personal_injury boolean 
, property_damage boolean
, fatal boolean
, commercial_license boolean
, hazmat boolean
, commercial_vehicle boolean
, alcohol boolean
, work_zone boolean
, state text
, vehicletype text 
, year smallint
, make text
, model text
, color text
, violation_type text 
, charge text
, article text
, contributed_to_accident boolean
, race text
, gender text
, driver_city text
, driver_state text
, dl_state text
, arrest_type text
, geolocation point)
partition by range (date_of_stop);

Now we have a partitioned table with no partitions. We need partitions and this is how you can add partitions to that partitioned table:

create table traffic_violations_p_2020
partition of traffic_violations_p
for values from ('2020-01-01') to ('2021-01-01');

create table traffic_violations_p_2021
partition of traffic_violations_p
for values from ('2021-01-01') to ('2022-01-01');

create table traffic_violations_p_2022
partition of traffic_violations_p
for values from ('2022-01-01') to ('2023-01-01');

create table traffic_violations_p_2023
partition of traffic_violations_p
for values from ('2023-01-01') to ('2024-01-01');

create table traffic_violations_p_2024
partition of traffic_violations_p
for values from ('2024-01-01') to ('2025-01-01');

create table traffic_violations_p_2025
partition of traffic_violations_p
for values from ('2025-01-01') to ('2026-01-01');

We now have a partition for each year from 2020 to 2025. As our partitioned table setup is now complete we can load the data:

insert into traffic_violations_p
select seqid
     , to_date(date_of_stop,'MM/DD/YYYY')
     , time_of_stop::time
     , agency
     , subagency
     , description
     , location
     , latitude::numeric
     , longitude::numeric
     , accident
     , belts::boolean
     , personal_injury::boolean
     , property_damage::boolean
     , fatal::boolean
     , commercial_license::boolean
     , hazmat::boolean
     , commercial_vehicle::boolean
     , alcohol::boolean
     , work_zone::boolean
     , state
     , vehicletype
     , case year
         when '' then null
         else year::smallint
       end
     , make
     , model
     , color
     , violation_type
     , charge
     , article
     , contributed_to_accident::boolean
     , race
     , gender
     , driver_city
     , driver_state
     , dl_state
     , arrest_type
     , geolocation::point
  from traffic_violations
  WHERE to_date(date_of_stop, 'MM/DD/YYYY') >= '2020-01-01'::date;

Demonstration

Lets assume that you expect that traffic violations will grow exponentially in 2026 because more and more cars will be on the road and when there will be more cars there will be more traffic violations. To be prepared for that you do not only want to partition by year but also by month. In other words: Add a new partition for 2026 but sub partition that by month. First of all you need a new partition for 2026 that itself is partitioned as well:

create table traffic_violations_p_2026
partition of traffic_violations_p
for values from ('2026-01-01') to ('2027-01-01') partition by range(date_of_stop);

Now we can add partitions to the just created partitioned partition:

create table traffic_violations_p_2026_jan
partition of traffic_violations_p_2026
for values from ('2026-01-01') to ('2026-02-01');
 
create table traffic_violations_p_2026_feb
partition of traffic_violations_p_2026
for values from ('2026-02-01') to ('2026-03-01');
 
create table traffic_violations_p_2026_mar
partition of traffic_violations_p_2026
for values from ('2026-03-01') to ('2026-04-01');
 
create table traffic_violations_p_2026_apr
partition of traffic_violations_p_2026
for values from ('2026-04-01') to ('2026-05-01');
 
create table traffic_violations_p_2026_may
partition of traffic_violations_p_2026
for values from ('2026-05-01') to ('2026-06-01');
 
create table traffic_violations_p_2026_jun
partition of traffic_violations_p_2026
for values from ('2026-06-01') to ('2026-07-01');
 
create table traffic_violations_p_2026_jul
partition of traffic_violations_p_2026
for values from ('2026-07-01') to ('2026-08-01');
 
create table traffic_violations_p_2026_aug
partition of traffic_violations_p_2026
for values from ('2026-08-01') to ('2026-09-01');
 
create table traffic_violations_p_2026_sep
partition of traffic_violations_p_2026
for values from ('2026-09-01') to ('2026-10-01');
 
create table traffic_violations_p_2026_oct
partition of traffic_violations_p_2026
for values from ('2026-10-01') to ('2026-11-01');
 
create table traffic_violations_p_2026_nov
partition of traffic_violations_p_2026
for values from ('2026-11-01') to ('2026-12-01');
 
create table traffic_violations_p_2026_dec
partition of traffic_violations_p_2026
for values from ('2026-12-01') to ('2027-01-01');

Looking at psql’s output when we describe the partitioned table not very much changed, just the keyword “PARTITIONED” is showing up beside our new partition for 2026:

postgres=# \d+ traffic_violations_p
                                             Partitioned table "public.traffic_violations_p"
         Column          |          Type          | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
-------------------------+------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
 seqid                   | text                   |           |          |         | extended |             |              |
 date_of_stop            | date                   |           | not null |         | plain    |             |              |
 time_of_stop            | time without time zone |           |          |         | plain    |             |              |
 agency                  | text                   |           |          |         | extended |             |              |
 subagency               | text                   |           |          |         | extended |             |              |
 description             | text                   |           |          |         | extended |             |              |
 location                | text                   |           |          |         | extended |             |              |
 latitude                | numeric                |           |          |         | main     |             |              |
 longitude               | numeric                |           |          |         | main     |             |              |
 accident                | text                   |           |          |         | extended |             |              |
 belts                   | boolean                |           |          |         | plain    |             |              |
 personal_injury         | boolean                |           |          |         | plain    |             |              |
 property_damage         | boolean                |           |          |         | plain    |             |              |
 fatal                   | boolean                |           |          |         | plain    |             |              |
 commercial_license      | boolean                |           |          |         | plain    |             |              |
 hazmat                  | boolean                |           |          |         | plain    |             |              |
 commercial_vehicle      | boolean                |           |          |         | plain    |             |              |
 alcohol                 | boolean                |           |          |         | plain    |             |              |
 work_zone               | boolean                |           |          |         | plain    |             |              |
 state                   | text                   |           |          |         | extended |             |              |
 vehicletype             | text                   |           |          |         | extended |             |              |
 year                    | smallint               |           |          |         | plain    |             |              |
 make                    | text                   |           |          |         | extended |             |              |
 model                   | text                   |           |          |         | extended |             |              |
 color                   | text                   |           |          |         | extended |             |              |
 violation_type          | text                   |           |          |         | extended |             |              |
 charge                  | text                   |           |          |         | extended |             |              |
 article                 | text                   |           |          |         | extended |             |              |
 contributed_to_accident | boolean                |           |          |         | plain    |             |              |
 race                    | text                   |           |          |         | extended |             |              |
 gender                  | text                   |           |          |         | extended |             |              |
 driver_city             | text                   |           |          |         | extended |             |              |
 driver_state            | text                   |           |          |         | extended |             |              |
 dl_state                | text                   |           |          |         | extended |             |              |
 arrest_type             | text                   |           |          |         | extended |             |              |
 geolocation             | point                  |           |          |         | plain    |             |              |
Partition key: RANGE (date_of_stop)
Partitions: traffic_violations_p_2020 FOR VALUES FROM ('2020-01-01') TO ('2021-01-01'),
            traffic_violations_p_2021 FOR VALUES FROM ('2021-01-01') TO ('2022-01-01'),
            traffic_violations_p_2022 FOR VALUES FROM ('2022-01-01') TO ('2023-01-01'),
            traffic_violations_p_2023 FOR VALUES FROM ('2023-01-01') TO ('2024-01-01'),
            traffic_violations_p_2024 FOR VALUES FROM ('2024-01-01') TO ('2025-01-01'),
            traffic_violations_p_2025 FOR VALUES FROM ('2025-01-01') TO ('2026-01-01'),
            traffic_violations_p_2026 FOR VALUES FROM ('2026-01-01') TO ('2027-01-01'), PARTITIONED

Here the new functions in PostgreSQL 12 become very handy:

select * from pg_partition_tree('traffic_violations_p');
             relid             |        parentrelid        | isleaf | level
-------------------------------+---------------------------+--------+-------
 traffic_violations_p          |                           | f      |     0
 traffic_violations_p_2020     | traffic_violations_p      | t      |     1
 traffic_violations_p_2021     | traffic_violations_p      | t      |     1
 traffic_violations_p_2022     | traffic_violations_p      | t      |     1
 traffic_violations_p_2023     | traffic_violations_p      | t      |     1
 traffic_violations_p_2024     | traffic_violations_p      | t      |     1
 traffic_violations_p_2025     | traffic_violations_p      | t      |     1
 traffic_violations_p_2026     | traffic_violations_p      | f      |     1
 traffic_violations_p_2026_jan | traffic_violations_p_2026 | t      |     2
 traffic_violations_p_2026_feb | traffic_violations_p_2026 | t      |     2
 traffic_violations_p_2026_mar | traffic_violations_p_2026 | t      |     2
 traffic_violations_p_2026_apr | traffic_violations_p_2026 | t      |     2
 traffic_violations_p_2026_may | traffic_violations_p_2026 | t      |     2
 traffic_violations_p_2026_jun | traffic_violations_p_2026 | t      |     2
 traffic_violations_p_2026_jul | traffic_violations_p_2026 | t      |     2
 traffic_violations_p_2026_aug | traffic_violations_p_2026 | t      |     2
 traffic_violations_p_2026_sep | traffic_violations_p_2026 | t      |     2
 traffic_violations_p_2026_oct | traffic_violations_p_2026 | t      |     2
 traffic_violations_p_2026_nov | traffic_violations_p_2026 | t      |     2
 traffic_violations_p_2026_dec | traffic_violations_p_2026 | t      |     2
(20 rows)

To verify if data is routed correctly to the sub partitions let’s add some data for 2026:

insert into traffic_violations_p (date_of_stop)
       select * from generate_series ( date('2026-01-01')
                                     , date('2026-12-30')
                                     , interval '1 day' );

If we did the partitioning correctly we should see data in the new partitions:

select count(*) from traffic_violations_p_2026_jan;
 count
-------
    31
(1 row)
 
select count(*) from traffic_violations_p_2026_dec;
 count
-------
    30
(1 row)
 
select count(*) from traffic_violations_p_2026_feb;
 count
-------
    28
(1 row)

Of course you could go even further and sub-partition the monthly partitions further by day or week. You can also partition by list and then sub-partition the list partitions by range. Or partition by range and then sub-partition by list. That’s it for sub-partitioning.

See more

PostgreSQL Tutorial

Preparing the data set

Range partitioning

Demonstration

See more

Search

Categories

Tags