R: Split unbalanced list in data.frame column

#Split by ; as before allJobs <- strsplit(df$b, “;”, fixed=TRUE) #Replicate a by the number of jobs in each case n <- sapply(allJobs, length) id <- rep(df$a, times = n) #Turn allJobs into a vector job <- unlist(allJobs) #Retrieve position of each job jobNum <- unlist(lapply(n, seq_len)) #Combine into a data frame df2 <- data.frame(id … Read more

Split Data Frame into Rows of Fixed Size

I don’t understand why a plyr solution is needed. split works perfectly well and even hadley himself didn’t suggest a plyr/reshape2 solution when he looked at the earlier question: split(dfrm, (0:nrow(dfrm) %/% 300) # modulo division Does produce a warning but since you were expecting a non-evenly divisible result you should ignore it.

Returning first row of group

By reproducing the example data frame and testing it I found a way of getting the needed result: Order data by relevant columns (ID, Start) ordered_data <- data[order(data$ID, data$Start),] Find the first row for each new ID final <- ordered_data[!duplicated(ordered_data$ID),]

R: speeding up “group by” operations

Instead of the normal R data frame, you can use a immutable data frame which returns pointers to the original when you subset and can be much faster: idf <- idata.frame(myDF) system.time(aggregateDF <- ddply(idf, c(“year”, “state”, “group1”, “group2”), function(df) wtd.mean(df$myFact, weights=df$weights))) # user system elapsed # 18.032 0.416 19.250 If I was to write a … Read more

Why is plyr so slow?

Why it is so slow? A little research located a mail group posting from a Aug. 2011 where @hadley, the package author, states This is a drawback of the way that ddply always works with data frames. It will be a bit faster if you use summarise instead of data.frame (because data.frame is very slow), … Read more

Create columns from factors and count [duplicate]

You only need to make some slight modification to your code. You should use .(Name) instead of c(“Name”): ddply(df1, .(Name), summarise, Score_1 = sum(Score == 1), Score_2 = sum(Score == 2), Score_3 = sum(Score == 3)) gives: Name Score_1 Score_2 Score_3 1 Ben 1 1 0 2 John 1 1 1 Other possibilities include: 1. … Read more