TIL: Pandas Timestamp Limitations

Don’t predict too far into the future.

Vincent Warmerdam koaning.io
2021-10-18

I was exploring the pandas docs while preparing a pandas course for calmcode.io when I stumbled on an interesting fact: there are bounds for timestamps in pandas.

To quote the docs:

Since pandas represents timestamps in nanosecond resolution, the time span that can be represented using a 64-bit integer is limited to approximately 584 years:

import pandas as pd

pd.Timestamp.min
# Timestamp('1677-09-21 00:12:43.145224193')

pd.Timestamp.max
# Timestamp('2262-04-11 23:47:16.854775807')

It makes sense when you consider pandas can handle nano-seconds and there’s only so much information that you can store in a 64-bit integer. If you have a use-case outside of this span of time, pandas does have a trick up it’s sleeve: you can create a date-like Period that could work as a datetime instead.

The Period class

Here’s how to generate periods.

span = pd.period_range("1215-01-01", "1381-01-01", freq="D")

You can also cast dates manually as an alternative to pd.to_datetime if you like.

s = pd.Series(['1111-01-01', '1212-12-12'])

def convert(item):
    year = int(item[:4])
    month = int(item[5:7])
    day = int(item[8:10])
    return pd.Period(year=year, month=month, day=day, freq="D")
  
s.apply(convert)

#  0    1111-01-01
#  1    1212-12-12
#  dtype: period[D]