pandas – get most recent value of a particular column indexed by another column (get maximum value of a particular column indexed by another column)

If the number of “obj_id”s is very high you’ll want to sort the entire dataframe and then drop duplicates to get the last element.

sorted = df.sort_index(by='data_date')
result = sorted.drop_duplicates('obj_id', keep='last').values

This should be faster (sorry I didn’t test it) because you don’t have to do a custom agg function, which is slow when there is a large number of keys. You might think it’s worse to sort the entire dataframe, but in practice in python sorts are fast and native loops are slow.

Leave a Comment