ChatGPT解决这个技术问题 Extra ChatGPT

Which timestamp type should I choose in a PostgreSQL database?

I would like to define a best practice for storing timestamps in my Postgres database in the context of a multi-timezone project.

I can

choose TIMESTAMP WITHOUT TIME ZONE and remember which timezone was used at insertion time for this field choose TIMESTAMP WITHOUT TIME ZONE and add another field which will contain the name of the timezone that was used at insertion time choose TIMESTAMP WITH TIME ZONE and insert the timestamps accordingly

I have a slight preference for option 3 (timestamp with time zone) but would like to have an educated opinion on the matter.


F
Freedom_Ben

First off, PostgreSQL’s time handling and arithmetic is fantastic and Option 3 is fine in the general case. It is, however, an incomplete view of time and timezones and can be supplemented:

Store the name of a user’s time zone as a user preference (e.g. America/Los_Angeles, not -0700). Have user events/time data submitted local to their frame of reference (most likely an offset from UTC, such as -0700). In application, convert the time to UTC and stored using a TIMESTAMP WITH TIME ZONE column. Return time requests local to a user's time zone (i.e. convert from UTC to America/Los_Angeles). Set your database's timezone to UTC.

This option doesn’t always work because it can be hard to get a user’s time zone and hence the hedge advice to use TIMESTAMP WITH TIME ZONE for lightweight applications. That said, let me explain some background aspects of this this Option 4 in more detail.

Like Option 3, the reason for the WITH TIME ZONE is because the time at which something happened is an absolute moment in time. WITHOUT TIME ZONE yields a relative time zone. Don't ever, ever, ever mix absolute and relative TIMESTAMPs.

From a programmatic and consistency perspective, ensure all calculations are made using UTC as the time zone. This isn’t a PostgreSQL requirement, but it helps when integrating with other programming languages or environments. Setting a CHECK on the column to make sure the write to the time stamp column has a time zone offset of 0 is a defensive position that prevents a few classes of bugs (e.g. a script dumps data to a file and something else sorts the time data using a lexical sort). Again, PostgreSQL doesn’t need this to do date calculations correctly or to convert between time zones (i.e. PostgreSQL is very adept at converting times between any two arbitrary time zones). To ensure data going in to the database is stored with an offset of zero:

CREATE TABLE my_tbl (
  my_timestamp TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
  CHECK(EXTRACT(TIMEZONE FROM my_timestamp) = '0')
);
test=> SET timezone = 'America/Los_Angeles';
SET
test=> INSERT INTO my_tbl (my_timestamp) VALUES (NOW());
ERROR:  new row for relation "my_tbl" violates check constraint "my_tbl_my_timestamp_check"
test=> SET timezone = 'UTC';
SET
test=> INSERT INTO my_tbl (my_timestamp) VALUES (NOW());
INSERT 0 1

It's not 100% perfect, but it provides a strong enough anti-footshooting measure that makes sure the data is already converted to UTC. There are lots of opinions on how to do this, but this seems to be the best in practice from my experience.

Criticisms of database time zone handling is largely justified (there are plenty of databases that handle this with great incompetence), however PostgreSQL’s handling of timestamps and timezones is pretty awesome (despite a few "features" here and there). For example, one such feature:

-- Make sure we're all working off of the same local time zone
test=> SET timezone = 'America/Los_Angeles';
SET
test=> SELECT NOW();
              now              
-------------------------------
 2011-05-27 15:47:58.138995-07
(1 row)

test=> SELECT NOW() AT TIME ZONE 'UTC';
          timezone          
----------------------------
 2011-05-27 22:48:02.235541
(1 row)

Note that AT TIME ZONE 'UTC' strips time zone info and creates a relative TIMESTAMP WITHOUT TIME ZONE using your target’s frame of reference (UTC).

When converting from an incomplete TIMESTAMP WITHOUT TIME ZONE to a TIMESTAMP WITH TIME ZONE, the missing time zone is inherited from your connection:

test=> SET timezone = 'America/Los_Angeles';
SET
test=> SELECT EXTRACT(TIMEZONE_HOUR FROM NOW());
 date_part 
-----------
        -7
(1 row)
test=> SELECT EXTRACT(TIMEZONE_HOUR FROM TIMESTAMP WITH TIME ZONE '2011-05-27 22:48:02.235541');
 date_part 
-----------
        -7
(1 row)

-- Now change to UTC    
test=> SET timezone = 'UTC';
SET
-- Create an absolute time with timezone offset:
test=> SELECT NOW();
              now              
-------------------------------
 2011-05-27 22:48:40.540119+00
(1 row)

-- Creates a relative time in a given frame of reference (i.e. no offset)
test=> SELECT NOW() AT TIME ZONE 'UTC';
          timezone          
----------------------------
 2011-05-27 22:48:49.444446
(1 row)

test=> SELECT EXTRACT(TIMEZONE_HOUR FROM NOW());
 date_part 
-----------
         0
(1 row)

test=> SELECT EXTRACT(TIMEZONE_HOUR FROM TIMESTAMP WITH TIME ZONE '2011-05-27 22:48:02.235541');
 date_part 
-----------
         0
(1 row)

The bottom line:

store a user’s time zone as a named label (e.g. America/Los_Angeles) and not an offset from UTC (e.g. -0700)

use UTC for everything unless there is a compelling reason to store a non-zero offset

treat all non-zero UTC times as an input error

never mix and match relative and absolute timestamps

also use UTC as the timezone in the database if possible

Random programming language note: Python's datetime data type is very good at maintaining the distinction between absolute vs relative times (albeit frustrating at first until you supplement it with a library like PyTZ).

EDIT

Let me explain the difference between relative vs absolute a bit more.

Absolute time is used to record an event. Examples: "User 123 logged in" or "a graduation ceremonies start at 2011-05-28 2pm PST." Regardless of your local time zone, if you could teleport to where the event occurred, you could witness the event happening. Most time data in a database is absolute (and therefore should be TIMESTAMP WITH TIME ZONE, ideally with a +0 offset and a textual label representing the rules governing the particular timezone - not an offset).

A relative event would be to record or schedule the time of something from the perspective of a yet-to-be-determined time zone. Examples: "our business's doors open at 8am and close at 9pm", "let's meet every Monday at 7am for a weekly breakfast meeting," or "every Halloween at 8pm." In general, relative time is used in a template or factory for events, and absolute time is used for almost everything else. There is one rare exception that’s worth pointing out which should illustrate the value of relative times. For future events that are far enough in the future where there could be uncertainty about the absolute time at which something could occur, use a relative timestamp. Here’s a real world example:

Suppose it’s the year 2004 and you need to schedule a delivery on October 31st in 2008 at 1pm on the West Coast of the US (i.e. America/Los_Angeles/PST8PDT). If you stored that using absolute time using ’2008-10-31 21:00:00.000000+00’::TIMESTAMP WITH TIME ZONE , the delivery would have shown up at 2pm because the US Government passed the Energy Policy Act of 2005 that changed the rules governing daylight savings time. In 2004 when the delivery was scheduled, the date 10-31-2008 would have been Pacific Standard Time (+8000), but starting in year 2005+ timezone databases recognized that 10-31-2008 would have been Pacific Daylight Savings time (+0700). Storing a relative timestamp with the time zone would have resulted in a correct delivery schedule because a relative timestamp is immune to Congress’ ill-informed tampering. Where the cutoff between using relative vs absolute times for scheduling things is, is a fuzzy line, but my rule of thumb is that scheduling for anything in the future further than 3-6mo should make use of relative timestamps (scheduled = absolute vs planned = relative ???).

The other/last type of relative time is the INTERVAL. Example: "the session will time out 20 minutes after a user logs in". An INTERVAL can be used correctly with either absolute timestamps (TIMESTAMP WITH TIME ZONE) or relative timestamps (TIMESTAMP WITHOUT TIME ZONE). It is equally correct to say, "a user session expires 20min after a successful login (login_utc + session_duration)" or "our morning breakfast meeting can only last 60 minutes (recurring_start_time + meeting_length)".

Last bits of confusion: DATE, TIME, TIME WITHOUT TIME ZONE and TIME WITH TIME ZONE are all relative data types. For example: '2011-05-28'::DATE represents a relative date since you have no time zone information which could be used to identify midnight. Similarly, '23:23:59'::TIME is relative because you don't know either the time zone or the DATE represented by the time. Even with '23:59:59-07'::TIME WITH TIME ZONE, you don't know what the DATE would be. And lastly, DATE with a time zone is not in fact a DATE, it is a TIMESTAMP WITH TIME ZONE:

test=> SET timezone = 'America/Los_Angeles';
SET
test=> SELECT '2011-05-11'::DATE AT TIME ZONE 'UTC';
      timezone       
---------------------
 2011-05-11 07:00:00
(1 row)

test=> SET timezone = 'UTC';
SET
test=> SELECT '2011-05-11'::DATE AT TIME ZONE 'UTC';
      timezone       
---------------------
 2011-05-11 00:00:00
(1 row)

Putting dates and time zones in databases is a good thing, but it is easy to get subtly incorrect results. Minimal additional effort is required to store time information correctly and completely, however that doesn’t mean the extra effort is always required.


If you accurately tell postgresql the correct timezone the user's timestamp is in, postgresql will do the heavy lifting behind the scenes. Converting it yourself is just borrowing trouble.
@Sean - with your check constraint, how do you ever insert a timestamp without set timezone to 'UTC'? You know that all timezone-aware dates are stored internally in UTC?
The point of the check is to make sure that data is stored with zero offset from UTC. Sorting and retrieval of information and comparison of times with non-zero offsets is error prone. By enforcing a zero UTC offset, you can consistently interact with the data from a single perspective in a nearly zero-risk way that behaves predictably in all scenarios. If it were practical for timestamps to support textual representations of timezones, my thoughts on the subject would be different. :~]
@Sean: But, as Jack indicates, all timezone-aware timestamps fundamentally are stored internally in UTC and are converted to your local timezone when used; effectively, extract (timezone from ...) will then always return whatever the local timezone of the connection is: it has no relation to how the timestamp was "stored". Put differently, the time zone is not part of the type at all, and cannot be stored: the "with time zone" is just a property of how the data will be converted when interacting with other types. The data thereby has no representation of timezones at all, textual or otherwise.
@JayFreeman-saurik-: you're absolutely correct. The ''CHECK()'' is there as an anti-footshooting measure to protect against possibly dodgy code. Ensuring that data is UTC on write provides a modest guarantee that the code was thought through or the execution environment is setup correctly.
J
Jay

Sean's answer is overly complex and misleading.

The fact is that both "WITH TIME ZONE" and "WITHOUT TIME ZONE" store the value as a unix-like absolute UTC timestamp. The difference is all in how the timestamp is displayed. When "WITH time zone" then the displayed value is the UTC stored value translated to the user's zone. When "WITHOUT time zone" the UTC stored value is twisted so as to show the same clock face no matter what zone the user has set".

The only situation where a "WITHOUT time zone" is usable is when a clock face value is applicable regardless of actual zone. For example, when a timestamp indicates when voting booths might close (ie. they close at 20:00 regardless of a person's timezone).

Use choice 3. Always use "WITH time zone" unless there is a very specific reason not to.


David E. Wheeler, a major Postgres expert, would agree with your assessment according to his posting, Always Use TIMESTAMP WITH TIME ZONE.
What if you will have the browser convert the UTC time stamp to local time zone? So, the db will never do the conversion and only contain UTC. Would "WITHOUT time zone" be acceptable?
G
GordonM

My preference is towards option 3, as Postgres can then do al ot of the work recalculating timestamps relative to timezone for you, whereas with the other two you'll have to do that yourself. The extra storage overhead of storing the timestamp with a timezone is really negligible unless you're talking millions of records, in which case you probably already have pretty meaty storage requirements anyway.


Incorrect. There is no overhead… Postgres does not store the time zone ('offset' is the correct term, not time zone, by the way). The TIMESTAMP WITH TIME ZONE name is misleading. It really means "pay attention to any specified offset when inserting/updating and use that offset to adjust the date-time to UTC". The TIMESTAMP WITHOUT TIME ZONE name means "ignore any offset that may be present during insert/update, consider the date and time portions as being in UTC with no need for adjustment". Read the doc carefully.
@BasilBourque thank you for this piece of information. Incredibly useful. For others reading this the line from the doc says, "In a literal that has been determined to be timestamp without time zone, PostgreSQL will silently ignore any time zone indication. That is, the resulting value is derived from the date/time fields in the input value, and is not adjusted for time zone. "