Finding running maximum by group

We can try data.table. Convert the ‘data.frame’ to ‘data.table’ (setDT(df1)), grouped by ‘group’ , we get the cummax of ‘var’ and assign (:=) it to a new variable (‘curMax’)

library(data.table)
setDT(df1)[, curMax := cummax(var), by = group]

As commented by @Michael Chirico, if the data is not ordered by ‘time’, we can do that in the ‘i’

setDT(df1)[order(time), curMax:=cummax(var), by = group]

Or with dplyr

library(dplyr)
df1 %>% 
    group_by(group) %>%
    mutate(curMax = cummax(var)) 

If df1 is tbl_sql explicit ordering might be required, using arrange

df1 %>% 
    group_by(group) %>%
    arrange(time, .by_group=TRUE) %>%
    mutate(curMax = cummax(var)) 

or dbplyr::window_order

library(dbplyr)

df1 %>% 
    group_by(group) %>%
    window_order(time) %>%
    mutate(curMax = cummax(var)) 

Leave a Comment