By John Doe December 20, 2024
Summary: In this article, we will learn how to skip rows with type cast error when using COPY FROM in PostgreSQL.
Table of Contents
Introduction
There is a great feature released in PostgreSQL 17. You probably already know COPY. Whenever you want to load something into or unload something out of PostgreSQL, this is the tool to use. When loading into PostgreSQL there is something to consider up to PostgreSQL 16 which might be very annoying: Whenever there is a line copy cannot write into the target table the whole process will be aborted and nothing will be loaded at all. Maybe this is something you want to have, but for other use cases the preferred behavior would be either to just ignore those lines and/or write the error messages somewhere else. In both cases you would not lose the rest of the data and don’t need to start from scratch after fixing the data to be loaded.
Example
As usual lets start with a little test case: Suppose we have this text file:
$ cat data.txt
1 aaaa 1 aaaa
2 bbbb 2 bbbb
3 cccc 3 cccc
4 dddd 4 dddd
5 eeee 5 eeee
6 ffff 6 ffff
7 gggg 7 gggg
8 hhhh 8 hhhh
A table which can hold this data might look like this:
CREATE TABLE import (a int, b varchar(4), c int, d varchar(4));
postgres=# \d import
Table "public.import"
Column | Type | Collation | Nullable | Default
--------+----------------------+-----------+----------+---------
a | integer | | |
b | character varying(4) | | |
c | integer | | |
d | character varying(4) | | |
All you need to do, to get that data from the text file into the table is this:
COPY import FROM '/home/postgres/data.txt' with (delimiter ' ');
SELECT * FROM import;
a | b | c | d
---+------+---+------
1 | aaaa | 1 | aaaa
2 | bbbb | 2 | bbbb
3 | cccc | 3 | cccc
4 | dddd | 4 | dddd
5 | eeee | 5 | eeee
6 | ffff | 6 | ffff
7 | gggg | 7 | gggg
8 | hhhh | 8 | hhhh
(8 rows)
Pretty simple. The trouble might start if the file to be loaded does look like this:
TRUNCATE import;
$ cat data.txt2
1 aaaa 1 aaaa
2 bbbb 2 bbbb
3 cccc 3 cccc
4 dddd 4 dddd
5 eeee 5 eeee
6 ffff 6 ffff
7 gggg 7 gggg
8 hhhh 8 hhhh
9 jjjjjjjjj 9 jjjj
10 kkkk 10 kkkk
11 llll 11 lllllll
12 mmmm 12 mmmm
13 nnnn 13 nnnn
Lines 9 and 11 will definitely not fit into the table and when you try to load this the whole process will fail and nothing will be loaded at all:
COPY import FROM '/home/postgres/data.txt2' with (delimiter ' ');
ERROR: value too long for type character varying(4)
CONTEXT: COPY import, line 9, column b: "jjjjjjjjj"
SELECT * FROM import;
a | b | c | d
---+---+---+---
(0 rows)
Starting with PostgreSQL 17 you will have another option:
postgres=# \h copy
Command: COPY
Description: copy data between a file and a table
Syntax:
COPY table_name [ ( column_name [, ...] ) ]
FROM { 'filename' | PROGRAM 'command' | STDIN }
[ [ WITH ] ( option [, ...] ) ]
[ WHERE condition ]
COPY { table_name [ ( column_name [, ...] ) ] | ( query ) }
TO { 'filename' | PROGRAM 'command' | STDOUT }
[ [ WITH ] ( option [, ...] ) ]
where option can be one of:
FORMAT format_name
FREEZE [ boolean ]
DELIMITER 'delimiter_character'
NULL 'null_string'
DEFAULT 'default_string'
HEADER [ boolean | MATCH ]
QUOTE 'quote_character'
ESCAPE 'escape_character'
FORCE_QUOTE { ( column_name [, ...] ) | * }
FORCE_NOT_NULL { ( column_name [, ...] ) | * }
FORCE_NULL { ( column_name [, ...] ) | * }
ON_ERROR error_action
ENCODING 'encoding_name'
LOG_VERBOSITY verbosity
Using the option “ON_ERROR” you can ask PostgreSQL to just ignore any data which does not fit into the target table:
COPY import FROM '/home/postgres/data.txt2' with (delimiter ' ', ON_ERROR ignore);
SELECT * FROM import;
a | b | c | d
----+------+----+------
1 | aaaa | 1 | aaaa
2 | bbbb | 2 | bbbb
3 | cccc | 3 | cccc
4 | dddd | 4 | dddd
5 | eeee | 5 | eeee
6 | ffff | 6 | ffff
7 | gggg | 7 | gggg
8 | hhhh | 8 | hhhh
10 | kkkk | 10 | kkkk
12 | mmmm | 12 | mmmm
13 | nnnn | 13 | nnnn
(11 rows)
Everything is there, except lines 9 and 11. For now you can only ignore the errors or stop on the first error (which is the default, as before).