December 11, 2024
Summary: in this tutorial, you will understand freezing in PostgreSQL.
Table of Contents
Transaction ID wraparound
PostgreSQL uses 32-bit transaction IDs. This is a pretty large number (about 4 billion), but with intensive work of the server, this number is not unlikely to get exhausted. For example: with the workload of 1000 transactions a second, this will happen as early as in one month and a half of continuous work.
But the multiversion concurrency control relies on the sequential numbering, which means that of two transactions the one with a smaller number can be considered to have started earlier. Therefore, it is clear that it is not an option to just reset the counter and start the numbering from scratch.
But why not use 64-bit transaction IDs - wouldn’t it completely eliminate the issue? The thing is that the header of each tuple stores two transaction IDs: xmin
and xmax
. The header is pretty large as it is - 23 bytes at a minimum, and the increase of the bit size would entail the increase of the header by extra 8 bytes. And this is out of all reason.
So what’s to be done? Instead of ordering transaction IDs sequentially (as numbers), imagine a circle or a clock dial. Transaction IDs are compared in the same sense as clock readings are compared. That is, for each transaction, the “counterclockwise” half of transaction IDs is regarded to pertain to the past, while the “clockwise” part is regarded to pertain to the future.
The age of the transaction is defined as the number of transactions that ran since the time when the transaction occurred in the system (regardless of the transaction ID wraparound). To figure out whether one transaction is older than the other, we compare their ages rather than IDs. (By the way, it’s for this reason that the operations “greater” and “less” are not defined for the xid
data type.)
But this looped arrangement is troublesome. A transaction that was in the distant past (transaction 1 in the figure), some time later will get into the half of the circle pertinent to the future. This, certainly, breaks visibility rules and would cause issues: changes done by the transaction 1 would just fall out of sight.
Tuple freezing and visibility rules
To prevent such “travels” from the past to future, vacuuming does one more task (in addition to freeing page space). It finds pretty old and “cold” tuples (which are visible in all snapshots and are unlikely to change) and marks them in a special way, that is, “freezes” them. A frozen tuple is considered older than any normal data and is always visible in all snapshots. And it is no longer needed to look at the xmin
transaction number, and this number can be safely reused. So, frozen tuples always remain in the past.
To track the xmin
transaction as frozen, both hint bits are set: committed
and aborted
.
Note that the xmax
transaction does not need freezing. Its existence indicates that the tuple is not live anymore. When it gets no longer visible in data snapshots, this tuple will be vacuumed away.
Let’s create a table for experiments. And let’s specify the minimum fillfactor for it such that only two rows fit on each page - this makes it more convenient for us to watch what is happening. Let’s also turn autovacuum off to control the vacuuming time on our own.
CREATE TABLE tfreeze(
id integer,
s char(300)
) WITH (fillfactor = 10, autovacuum_enabled = off);
We’ve already created a few variants of the function that uses the “pageinspect” extension to show the tuples located on a page. We will now create one more variant of this function: it will show several pages at once and output the age of the xmin
transaction (using the age
system function):
CREATE FUNCTION heap_page(relname text, pageno_from integer, pageno_to integer)
RETURNS TABLE(ctid tid, state text, xmin text, xmin_age integer, xmax text, t_ctid tid)
AS $$
SELECT (pageno,lp)::text::tid AS ctid,
CASE lp_flags
WHEN 0 THEN 'unused'
WHEN 1 THEN 'normal'
WHEN 2 THEN 'redirect to '||lp_off
WHEN 3 THEN 'dead'
END AS state,
t_xmin || CASE
WHEN (t_infomask & 256+512) = 256+512 THEN ' (f)'
WHEN (t_infomask & 256) > 0 THEN ' (c)'
WHEN (t_infomask & 512) > 0 THEN ' (a)'
ELSE ''
END AS xmin,
age(t_xmin) xmin_age,
t_xmax || CASE
WHEN (t_infomask & 1024) > 0 THEN ' (c)'
WHEN (t_infomask & 2048) > 0 THEN ' (a)'
ELSE ''
END AS xmax,
t_ctid
FROM generate_series(pageno_from, pageno_to) p(pageno),
heap_page_items(get_raw_page(relname, pageno))
ORDER BY pageno, lp;
$$ LANGUAGE SQL;
Note that both committed
and aborted
hint bits set indicate freezing (which we denote with a parenthesized “f”). There is a specialized ID that indicates frozen transactions: FrozenTransactionId = 2. This method was in place in PostgreSQL versions earlier than 9.4, and now it is replaced with hint bits. This permits to retain the initial transaction number in a tuple, which is convenient for maintenance and debugging. However, you can still come across transactions with ID = 2 in old systems, even upgraded to latest versions.
We will also need the “pg_visibility” extension, which enables us to look into the visibility map:
CREATE EXTENSION pg_visibility;
In PostgreSQL versions earlier than 9.6, the visibility map contained one bit per page; the map tracked only pages with “pretty old” row versions, which are visible in all data snapshots for sure. The idea behind this is that if a page is tracked in the visibility map, the visibility rules for its tuples do not need to be checked.
Starting with version 9.6, the all-frozen bit for each page was added to the visibility map. The all-frozen bit tracks pages where all tuples are frozen.
Let’s insert a few rows into the table and immediately do vacuuming for the visibility map to be created:
INSERT INTO tfreeze(id, s)
SELECT g.id, 'FOO' FROM generate_series(1,100) g(id);
VACUUM tfreeze;
And we can see that both pages are known to be all-visible, but not to be all-frozen:
SELECT * FROM generate_series(0,1) g(blkno),
pg_visibility_map('tfreeze',g.blkno)
ORDER BY g.blkno;
blkno | all_visible | all_frozen
-------+-------------+------------
0 | t | f
1 | t | f
(2 rows)
The age of the transaction that created the rows (xmin_age
) equals one - this is the last transaction performed in the system:
SELECT * FROM heap_page('tfreeze',0,1);
ctid | state | xmin | xmin_age | xmax | t_ctid
-------+--------+---------+----------+-------+--------
(0,1) | normal | 697 (c) | 1 | 0 (a) | (0,1)
(0,2) | normal | 697 (c) | 1 | 0 (a) | (0,2)
(1,1) | normal | 697 (c) | 1 | 0 (a) | (1,1)
(1,2) | normal | 697 (c) | 1 | 0 (a) | (1,2)
(4 rows)