Extract row corresponding to minimum value of a variable by group

Slightly more elegant: library(data.table) DT[ , .SD[which.min(Employees)], by = State] State Company Employees 1: AK D 24 2: RI E 19 Slighly less elegant than using .SD, but a bit faster (for data with many groups): DT[DT[ , .I[which.min(Employees)], by = State]$V1] Also, just replace the expression which.min(Employees) with Employees == min(Employees), if your data … Read more

Overlap join with start and end positions

Overlap joins was implemented with commit 1375 in data.table v1.9.3, and is available in the current stable release, v1.9.4. The function is called foverlaps. From NEWS: 29) Overlap joins #528 is now here, finally!! Except for type=”equal” and maxgap and minoverlap arguments, everything else is implemented. Check out ?foverlaps and the examples there on its … Read more

Understanding exactly when a data.table is a reference to (vs a copy of) another data.table

Yes, it’s subassignment in R using <- (or = or ->) that makes a copy of the whole object. You can trace that using tracemem(DT) and .Internal(inspect(DT)), as below. The data.table features := and set() assign by reference to whatever object they are passed. So if that object was previously copied (by a subassigning <- … Read more

Naming columns in a data table in R

I don’t word with data tables, but this is a solution that would work for data frames, and should hopefully generalize. The strategy is to use the fact that you can fill one vector with another vector, without ever having to use a loop. # make the example data sets D1 <- as.data.frame(matrix(data=(1:(20*181)), nrow=20, ncol=181)) … Read more