Index access methods must handle concurrent updates
of the index by multiple processes.
The core PostgreSQL system obtains
AccessShareLock
on the index during an index scan, and
RowExclusiveLock
when updating the index (including plain
VACUUM
). Since these lock types do not conflict, the access
method is responsible for handling any fine-grained locking it might need.
An ACCESS EXCLUSIVE
lock on the index as a whole will be
taken only during index creation, destruction, or REINDEX
(SHARE UPDATE EXCLUSIVE
is taken instead with
CONCURRENTLY
).
Building an index type that supports concurrent updates usually requires
extensive and subtle analysis of the required behavior. For the b-tree
and hash index types, you can read about the design decisions involved in
src/backend/access/nbtree/README
and
src/backend/access/hash/README
.
Aside from the index's own internal consistency requirements, concurrent updates create issues about consistency between the parent table (the heap) and the index. Because PostgreSQL separates accesses and updates of the heap from those of the index, there are windows in which the index might be inconsistent with the heap. We handle this problem with the following rules:
A new heap entry is made before making its index entries. (Therefore a concurrent index scan is likely to fail to see the heap entry. This is okay because the index reader would be uninterested in an uncommitted row anyway. But see Section 62.5.)
When a heap entry is to be deleted (by VACUUM
), all its
index entries must be removed first.
An index scan must maintain a pin
on the index page holding the item last returned by
amgettuple
, and ambulkdelete
cannot delete
entries from pages that are pinned by other backends. The need
for this rule is explained below.
Without the third rule, it is possible for an index reader to
see an index entry just before it is removed by VACUUM
, and
then to arrive at the corresponding heap entry after that was removed by
VACUUM
.
This creates no serious problems if that item
number is still unused when the reader reaches it, since an empty
item slot will be ignored by heap_fetch()
. But what if a
third backend has already re-used the item slot for something else?
When using an MVCC-compliant snapshot, there is no problem because
the new occupant of the slot is certain to be too new to pass the
snapshot test. However, with a non-MVCC-compliant snapshot (such as
SnapshotAny
), it would be possible to accept and return
a row that does not in fact match the scan keys. We could defend
against this scenario by requiring the scan keys to be rechecked
against the heap row in all cases, but that is too expensive. Instead,
we use a pin on an index page as a proxy to indicate that the reader
might still be “in flight” from the index entry to the matching
heap entry. Making ambulkdelete
block on such a pin ensures
that VACUUM
cannot delete the heap entry before the reader
is done with it. This solution costs little in run time, and adds blocking
overhead only in the rare cases where there actually is a conflict.
This solution requires that index scans be “synchronous”: we have to fetch each heap tuple immediately after scanning the corresponding index entry. This is expensive for a number of reasons. An “asynchronous” scan in which we collect many TIDs from the index, and only visit the heap tuples sometime later, requires much less index locking overhead and can allow a more efficient heap access pattern. Per the above analysis, we must use the synchronous approach for non-MVCC-compliant snapshots, but an asynchronous scan is workable for a query using an MVCC snapshot.
In an amgetbitmap
index scan, the access method does not
keep an index pin on any of the returned tuples. Therefore
it is only safe to use such scans with MVCC-compliant snapshots.
When the ampredlocks
flag is not set, any scan using that
index access method within a serializable transaction will acquire a
nonblocking predicate lock on the full index. This will generate a
read-write conflict with the insert of any tuple into that index by a
concurrent serializable transaction. If certain patterns of read-write
conflicts are detected among a set of concurrent serializable
transactions, one of those transactions may be canceled to protect data
integrity. When the flag is set, it indicates that the index access
method implements finer-grained predicate locking, which will tend to
reduce the frequency of such transaction cancellations.