By John Doe March 6, 2025
Summary: In this article, we will learn some caveats for some data types in PostgreSQL.
Table of Contents
Date/Time storage
Don’t use timestamp (without time zone)
Don’t use the timestamp
type to store timestamps, use timestamptz
(also known as timestamp with time zone
) instead.
Why not?
timestamptz
records a single moment in time. Despite what the name says it doesn’t store a timestamp, just a point in time described as the number of microseconds since January 1st, 2000 in UTC. You can insert values in any timezone and it’ll store the point in time that value describes. By default it will display times in your current timezone, but you can use at time zone
to display it in other time zones.
Because it stores a point in time it will do the right thing with arithmetic involving timestamps entered in different timezones - including between timestamps from the same location on different sides of a daylight savings time change.
timestamp
(also known as timestamp without time zone
) doesn’t do any of that, it just stores a date and time you give it. You can think of it being a picture of a calendar and a clock rather than a point in time. Without additional information - the timezone - you don’t know what time it records. Because of that, arithmetic between timestamps from different locations or between timestamps from summer and winter may give the wrong answer.
So if what you want to store is a point in time, rather than a picture of a clock, use timestamptz.
When should you?
If you’re dealing with timestamps in an abstract way, or just saving and retrieving them from an app, where you aren’t going to be doing arithmetic with them then timestamp might be suitable.
Don’t use timestamp (without time zone) to store UTC times
Storing UTC values in a timestamp without time zone
column is, unfortunately, a practice commonly inherited from other databases that lack usable timezone support.
Use timestamp with time zone
instead.
Why not?
Because there is no way for the database to know that UTC is the intended timezone for the column values.
This complicates many otherwise useful time calculations. For example, “last midnight in the timezone given by u.timezone” becomes this:
date_trunc('day', now() AT TIME ZONE u.timezone) AT TIME ZONE u.timezone AT TIME ZONE 'UTC'
And “the midnight prior to x.datecol
in u.timezone” becomes this:
date_trunc('day', x.datecol AT TIME ZONE 'UTC' AT TIME ZONE u.timezone)
AT TIME ZONE u.timezone AT TIME ZONE 'UTC'
When should you?
If compatibility with non-timezone-supporting databases trumps all other considerations.
Don’t use timetz
Don’t use the timetz
type. You probably want timestamptz
instead.
Why not?
Even the manual tells you it’s only implemented for SQL compliance.
The type time with time zone is defined by the SQL standard, but the definition exhibits properties which lead to questionable usefulness. In most cases, a combination of date, time, timestamp without time zone, and timestamp with time zone should provide a complete range of date/time functionality required by any application.
When should you?
Never.
Text storage
Don’t use char(n)
Don’t use the type char(n)
. You probably want text
.
Why not?
Any string you insert into a char(n)
field will be padded with spaces to the declared width. That’s probably not what you actually want.
The manual says:
Values of type character are physically padded with spaces to the specified width n, and are stored and displayed that way. However, trailing spaces are treated as semantically insignificant and disregarded when comparing two values of type character. In collations where whitespace is significant, this behavior can produce unexpected results; for example
SELECT 'a '::CHAR(2) collate "C" < E'a\n'::CHAR(2)
returns true, even though C locale would consider a space to be greater than a newline. Trailing spaces are removed when converting a character value to one of the other string types. Note that trailing spaces are semantically significant in character varying and text values, and when using pattern matching, that is LIKE and regular expressions.
The space-padding does waste space, but doesn’t make operations on it any faster; in fact the reverse, thanks to the need to strip spaces in many contexts.
It’s important to note that from a storage point of view char(n)
is not a fixed-width type. The actual number of bytes varies since characters may take more than one byte, and the stored values are therefore treated as variable-length anyway (even though the space padding is included in the storage).
When should you?
When you’re porting very, very old software that uses fixed width fields. Or when you read the snippet from the manual above and think “yes, that makes perfect sense and is a good match for my requirements” rather than gibbering and running away.
Don’t use char(n) even for fixed-length identifiers
Sometimes people respond to “don’t use char(n)
” with “but my values must always be exactly N characters long” (e.g. country codes, hashes, or identifiers from some other system). It is still a bad idea to use char(n)
even in these cases.
Use text
, or a domain over text, with CHECK(length(VALUE)=3)
or CHECK(VALUE ~ '^[[:alpha:]]{3}$')
or similar.
Why not?
Because char(n)
doesn’t reject values that are too short, it just silently pads them with spaces. So there’s no actual benefit over using text
with a constraint that checks for the exact length. As a bonus, such a check can also verify that the value is in the correct format.
Remember, there is no performance benefit whatsoever to using char(n)
over varchar(n)
. In fact the reverse is true. One particular problem that comes up is that if you try and compare a char(n)
field against a parameter where the driver has explicitly specified a type of text
or varchar
, you may be unexpectedly unable to use an index for the comparison. This can be hard to debug since it doesn’t show up on manual queries.
When should you?
Never.
Don’t use varchar(n) by default
Don’t use the type varchar(n)
by default. Consider varchar
(without the length limit) or text
instead.
Why not?
varchar(n)
is a variable width text field that will throw an error if you try and insert a string longer than n characters (not bytes) into it.
varchar
(without the (n)
) or text
are similar, but without the length limit. If you insert the same string into the three field types they will take up exactly the same amount of space, and you won’t be able to measure any difference in performance.
If what you really need is a text field with an length limit then varchar(n) is great, but if you pick an arbitrary length and choose varchar(20) for a surname field you’re risking production errors in the future when a person with a very long name signs up for your service.
Some databases don’t have a type that can hold arbitrary long text, or if they do it’s not as convenient or efficient or well-supported as varchar(n). Users from those databases will often use something like varchar(255)
when what they really want is text
.
If you need to constrain the value in a field you probably need something more specific than a maximum length - maybe a minimum length too, or a limited set of characters - and a check constraint can do all of those things as well as a maximum string length.
When should you?
When you want to, really. If what you want is a text field that will throw an error if you insert too long a string into it, and you don’t want to use an explicit check constraint then varchar(n) is a perfectly good type. Just don’t use it automatically without thinking about it.
Also, the varchar type is in the SQL standard, unlike the text type, so it might be the best choice for writing super-portable applications.
Other data types
Don’t use money
The money
data type isn’t actually very good for storing monetary values. Numeric, or (rarely) integer may be better.
Why not?
It’s a fixed-point type, implemented as a machine int, so arithmetic with it is fast. But it doesn’t handle fractions of a cent (or equivalents in other currencies), it’s rounding behaviour is probably not what you want.
It doesn’t store a currency with the value, rather assuming that all money columns contain the currency specified by the database’s lc_monetary locale setting. If you change the lc_monetary setting for any reason, all money columns will contain the wrong value. That means that if you insert ‘$10.00’ while lc_monetary is set to ’en_US.UTF-8’ the value you retrieve may be ‘10,00 Lei’ or ‘¥1,000’ if lc_monetary is changed.
Storing a value as a numeric, possibly with the currency being used in an adjacent column, might be better.
When should you?
If you’re only working in a single currency, aren’t dealing with fractional cents and are only doing addition and subtraction then money might be the right thing.
Don’t use serial
For new applications, identity columns should be used instead.
Why not?
The serial types have some weird behaviors that make schema, dependency, and permission management unnecessarily cumbersome.
When should you?
- If you need support to PostgreSQL older than version 10.
- In certain combinations with table inheritance.
- More generally, if you somehow use the same sequence for multiple tables, although in those cases an explicit declaration might be preferable over the serial types.