from data table, randomly select one row per group

OP provided only a single column in the example. Assuming that there are multiple columns in the original dataset, we group by ‘z’, sample 1 row from the sequence of rows per group, get the row index (.I), extract the column with the row index ($V1) and use that to subset the rows of ‘dt’.

dt[dt[ , .I[sample(.N,1)] , by = z]$V1]

More Related Contents:

Update subset of data.table based on join
Subsetting data.table using variables with same name as column
Subsetting data.table set by date range in R
Subset data.table by logical column
Select rows from a data frame based on values in a vector
data.table vs dplyr: can one do something well the other can’t or does poorly?
How to delete a row by reference in data.table?
How to replace NA with mean by group / subset?
Why does X[Y] join of data.tables not allow a full outer join, or a left join?
Subset data frame based on multiple conditions [duplicate]
dplyr mutate/replace several columns on a subset of rows
Proper/fastest way to reshape a data.table
efficiently generate a random sample of times and dates between two dates
Data.table meta-programming
Creating dummy variables in R data.table
Collapse rows with overlapping ranges
Why is plyr so slow?
Subset based on variable column name
How to reorder data.table columns (without copying)
Using setDT inside a function
Why is as.Date slow on a character vector?
How to group data.table by multiple columns?
Apply a function to a subset of data.table columns, by column-indices instead of name
Selecting a subset of columns in a data.table
Merge overlapping ranges into unique groups, in dataframe
add missing rows to a data table
How to select R data.table rows based on substring match (a la SQL like)
Any way to force fread() of data.table not to stop on empty lines?
R data table: update join
R fast single item lookup from list vs data.table vs hash

More Related Contents:

Leave a Comment Cancel reply