August 26, 2024
Summary: In this tutorial, you will understand the internals of sequential scans.
Table of Contents
In every database product, sequential scans or full table scans are often resource consuming, so almost all the developers and DBAs see such scans as a performance killer. Generally, this is a false perception or biased view. In many cases, sequential scans are proven to be a performance booster. However, due to a lack of knowledge, many experts demonize sequential scans and prefer to eliminate them during their queries’ execution. This article focuses on that part and tries to shed some light on the internals of sequential scans.
What is a sequential scan?
It is a method of accessing data from a table where every record in a table is scanned to see if it meets the desired search criteria. While accessing data from the database, there is a need to identify what records would satisfy the WHERE clause filter. In order to accomplish the same, database products sometimes need to scan all the records in the table, and matching records are placed in a separate section for further actions.
It happens in a sequential order of pages. Every page in a table is scanned in the “shared buffers” area of shared memory, and the result set is then put into the user area(work memory) for further operations, such as joins, sorting, and so on.
Process
Sequential scans happen in two phases:
- Copying pages into memory
- Fetching data from memory pages
Copying pages into memory
In every database, while storing data on file systems, it has to follow the standards of operating systems and the database itself. On disks, data is stored in form pages of uniform sizes. So, in order to read data, firstly, the database needs to read the blocks, which are also known as pages. In the case of PostgreSQL, the page size is 8kB. Whenever PostgreSQL needs to read data from the database, database pages will be loaded into a specific section of memory area that is known as shared buffers.
Fetching data from memory pages
After the shared buffers are loaded with the required pages, PostgreSQL scans those blocks in the memory and extracts rows from the pages. During this operation, each and every page is read, and all the records are matched to compare with the filter condition.
This is the operation where CPU operations are required. In this operation, the CPU will apply logic, read pages from shared buffers, and fetch rows from memory pages. During this operation, a processor will keep iterating from one page to another and one record to another. This continues until the last record is read. Due to this, a processor stays busy when a sequential scan is going on.
Cost calculation
As described in the process section, the complete operation happens in two phases. The total costs for both these operations are separately calculated and presented as the sum of both. The cost for the first operation is called IO cost, which estimates the amount of effort required to perform IOs. Another cost is CPU cost.
Disk run cost
In the case of sequential scans, every tuple is read on disk. On every disk, data pages are stored in a sequential manner. Also, they are accessed in the same fashion. Due to this, disks have to read blocks in a sequence, which is also called sequential reads. In PostgreSQL, we may set an estimated cost for fetching one page by setting seq_page_cost PG parameter.
disk_run_cost = the number of pages * seq_page_cost
(The number of pages in a table is stored in the relpages column of pg_class.)
CPU run cost
When pages are loaded into memory (shared buffers), they are still available in the form of PG blocks. But, they are readable by the OS only and are of no use until the read data inside pages is gleaned. The CPU’s jobs start here! CPUs have to identify tuples in pages and extract them to perform various operations. In PostgreSQL, we may set an estimated cost required to fetch one record using the cpu_tuple_cost parameter, and by multiplying it by the total number of rows in a table, we can obtain cpu_run_cost.
cpu_run_cost = Total number of records * cpu_tuple_cost
(The number of records in a table is stored in the reltuples column of pg_class.)
The total cost required to perform a sequential scan is the sum of the CPU run cost and the disk run cost.
total_cost = disk_run_cost + cpu_run_cost
Here, I have taken an example to calculate the value and validate it using EXPLAIN ANALYZE. The query below shows the necessary details along with statistics.
SELECT relation.no_of_pages AS "No. of Pages",
seq_page_cost.value AS "seq_page_cost",
relation.no_of_rows AS "No. of records",
cpu_tuple_cost.value AS "cpu_tuple_cost",
(relation.no_of_pages * seq_page_cost.value) AS "Disk run cost",
(relation.no_of_rows * cpu_tuple_cost.value) AS "CPU run cost",
(relation.no_of_pages * seq_page_cost.value + relation.no_of_rows * cpu_tuple_cost.value) AS "Total cost"
FROM
(SELECT setting::float value FROM pg_settings WHERE name = 'cpu_tuple_cost') cpu_tuple_cost,
(SELECT setting::float value FROM pg_settings WHERE name = 'seq_page_cost') seq_page_cost,
(SELECT reltuples::int no_of_rows, relpages::int no_of_pages FROM pg_class WHERE relname = 'sequential_scan_test_tbl') relation;
No. of Pages | seq_page_cost | No. of records | cpu_tuple_cost | Disk run cost | CPU run cost | Total cost
--------------+---------------+----------------+----------------+---------------+--------------+------------
14803 | 1 | 126728 | 0.01 | 14803 | 1267.28 | 16070.28
(1 row)
The EXPLAIN ANALYZE command below shows the cost incurred by a query. We can see it is the same as “Total cost” in the query above.
EXPLAIN ANALYZE SELECT * FROM sequential_scan_test_tbl;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on sequential_scan_test_tbl (cost=0.00..16070.28 rows=126728 width=1438) (actual time=6.202..114.818 rows=126728 loops=1)
Planning Time: 0.330 ms
Execution Time: 120.151 ms
(3 rows)
Sequential-scan-prone conditions
Missing indexes
In any query returning a relatively small data set, an index speeds up data retrieval. Indexes store OIDs associated with particular values, so when the table data is queried, it tries to look into an index associated with columns/expressions mentioned in the WHERE predicate. In the case of no-index on a column in the WHERE clause, the executor cannot fetch details of records, due to which a sequential scan becomes inevitable.
Stale statistics
In PostgreSQL, a cost-based optimization method is designed to choose the most effective execution plan; this method is strictly based on statistics of tables. ANALYZE method is used in which optimization engines calculate the costs of different operations based on available statistics. All the costs are compared, and the one with the lowest value wins, and the execution engine goes with the method with the lowest cost. For example, if the sequential scan’s cost is 3000 and the index scan’s cost is 150, the index scan is chosen. If statistics are not updated frequently, it may lead to poor execution plans, which may cause sequential scans.
Large result set
From the above two, it is very clear that the latest statistics and proper indexing can be useful to avoid sequential scans. However, there may be scenarios when sequentially accessing data is still evident despite both the above requirements being fulfilled. This happens because the query expects a huge data set from a table. In such a case, using an index becomes overhead and causes a delay in execution.
Conclusion
Sequential scans go through all the records in a table, which is a two-stage operation, i.e., copying blocks in the shared memory and reading data from blocks. Sequential scans may happen due to improper statistics or missing indexes; however, returning a large amount of data is an exception.