Subset rows corresponding to max value by group using data.table

Here’s the fast data.table way:

bdt[bdt[, .I[g == max(g)], by = id]$V1]

This avoids constructing .SD, which is the bottleneck in your expressions.

edit: Actually, the main reason the OP is slow is not just that it has .SD in it, but the fact that it uses it in a particular way – by calling [.data.table, which at the moment has a huge overhead, so running it in a loop (when one does a by) accumulates a very large penalty.

Leave a Comment