In my opinion, you should always prefer using a Timestamp
– it can easily transform back into a numpy datetime in the case it is needed.
numpy.datetime64
is essentially a thin wrapper for int64
. It has almost no date/time specific functionality.
pd.Timestamp
is a wrapper around a numpy.datetime64
. It is backed by the same int64 value, but supports the entire datetime.datetime
interface, along with useful pandas-specific functionality.
The in-array representation of these two is identical – it is a contigous array of int64s. pd.Timestamp
is a scalar box that makes working with individual values easier.
Going back to the linked answer, you could write it like this, which is shorter and happens to be faster.
%timeit (df.index.values >= pd.Timestamp('2011-01-02').to_datetime64()) & \
(df.index.values < pd.Timestamp('2011-01-03').to_datetime64())
192 µs ± 6.78 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)