How to calculate time difference by group using pandas?

You can use sort_values with groupby and aggregating diff:

df['diff'] = df.sort_values(['id','time']).groupby('id')['time'].diff()
print (df)
  id                time     diff
0  A 2016-11-25 16:32:17      NaT
1  A 2016-11-25 16:36:04 00:00:35
2  A 2016-11-25 16:35:29 00:03:12
3  B 2016-11-25 16:35:24      NaT
4  B 2016-11-25 16:35:46 00:00:22

If need remove rows with NaT in column diff use dropna:

df = df.dropna(subset=['diff'])
print (df)
  id                time     diff
2  A 2016-11-25 16:35:29 00:03:12
1  A 2016-11-25 16:36:04 00:00:35
4  B 2016-11-25 16:35:46 00:00:22

You can also overwrite column:

df.time = df.sort_values(['id','time']).groupby('id')['time'].diff()
print (df)
  id     time
0  A      NaT
1  A 00:00:35
2  A 00:03:12
3  B      NaT
4  B 00:00:22

df.time = df.sort_values(['id','time']).groupby('id')['time'].diff()
df = df.dropna(subset=['time'])
print (df)
  id     time
1  A 00:00:35
2  A 00:03:12
4  B 00:00:22

Leave a Comment