June 12, 2024
Summary: In this tutorial, you will learn how to split wide tables in PostgreSQL.
Table of Contents
Introduction
Consider all the attributes of an entity that has rows that are too wide for good performance. Look for some theme or principle to divide them into two groups. Then split the table into two tables, a primary table and a companion table, repeating the primary key in each one.
The shorter rows allow you to query or update each table quickly. For example, we have a table defined as following:
CREATE TABLE orders (
order_num integer primary key,
order_date date,
customer_num integer,
ship_instruct varchar(40),
ship_date date,
ship_weight decimal(8,2),
ship_charge money(6,2),
paid_date date);
Division by Bulk
One principle on which you can divide an entity table is bulk. Move the bulky attributes, which are usually character strings, to the companion table. Keep the numeric and other small attributes in the primary table. In the above example, you can split the ship_instruct column from the orders table. You can call the companion table orders_ship. It has two columns, a primary key that is a copy of orders.order_num and the original ship_instruct column.
Division by Frequency of Use
Another principle for division of an entity is frequency of use. If a few attributes are rarely queried, move them to a companion table. In the above table, for example, perhaps only one program queries the ship_instruct, ship_weight, and ship_charge columns. In that case, you can move them to a companion table.
Division by Frequency of Update
Updates take longer than queries, and updating programs lock data pages and rows of data during the update process, preventing querying programs from accessing the tables. If you can separate one table into two companion tables, one with the most-updated entities and the other with the most-queried entities, you can often improve overall response time.
PostgreSQL MVCC works by saving an internal copy of updated or deleted rows (also called tuples) until a transaction is either committed or rolled back. This saved internal copy is invisible to users. However, table bloat can occur when those invisible copies aren’t cleaned up regularly by the VACUUM or AUTOVACUUM utilities. You can separate the frequently updated columns into an individual table, to save storage costs and speed up your database system.
Performance Costs of Splitting Tables
Splitting a table uses extra disk space and adds complexity. Two copies of the primary key occur for each row, one copy in each table. Two primary-key indexes also exist.
You must modify existing programs, reports, and forms that use SELECT * because fewer columns are returned. Programs, reports, and forms that use attributes from both tables must perform a join to bring the tables together.
In this case, when you insert or delete a row, two tables are altered instead of one. If you do not coordinate the alteration of the two tables (by making them within a single transaction, for example), you lose semantic integrity.