Apply function conditionally

There are a lot of alternatives to do this. Note that if you are interested in another function different from sum, then just change the argument FUN=any.function, e.g, if you want mean, var length, etc, then just plug those functions into FUN argument, e.g, FUN=mean, FUN=var and so on. Let’s explore some alternatives:

aggregate function in base.

> aggregate(results ~ experiment, FUN=sum, data=DF)
  experiment results
1          A    86.3
2          B   986.0

Or maybe tapply ?

> with(DF, tapply(results, experiment, FUN=sum))
    A     B 
 86.3 986.0 

Also ddply from plyr package

> # library(plyr)
> ddply(DF[, -2], .(experiment), numcolwise(sum))
  experiment results
1          A    86.3
2          B   986.0

> ## Alternative syntax
> ddply(DF, .(experiment), summarize, sumResults = sum(results))
  experiment sumResults
1          A       86.3
2          B      986.0

Also the dplyr package

> require(dplyr)
> DF %>% group_by(experiment) %>% summarise(sumResults = sum(results))
Source: local data frame [2 x 2]

  experiment  sumResults
1          A        86.3
2          B       986.0

Using sapply and split, equivalent to tapply.

> with(DF, sapply(split(results, experiment), sum))
    A     B 
 86.3 986.0 

If you are concern about timing, data.table is your friend:

> # library(data.table)
> DT <- data.table(DF)
> DT[, sum(results), by=experiment]
   experiment    V1
1:          A  86.3
2:          B 986.0

Not so popular, but doBy package is nice (equivalent to aggregate, even in syntax!)

> # library(doBy)
> summaryBy(results~experiment, FUN=sum, data=DF)
  experiment results.sum
1          A        86.3
2          B       986.0

Also by helps in this situation

> (Aggregate.sums <- with(DF, by(results, experiment, sum)))
experiment: A
[1] 86.3
------------------------------------------------------------------------- 
experiment: B
[1] 986

If you want the result to be a matrix then use either cbind or rbind

> cbind(results=Aggregate.sums)
  results
A    86.3
B   986.0

sqldf from sqldf package also could be a good option

> library(sqldf)
> sqldf("select experiment, sum(results) `sum.results`
      from DF group by experiment")
  experiment sum.results
1          A        86.3
2          B       986.0

xtabs also works (only when FUN=sum)

> xtabs(results ~ experiment, data=DF)
experiment
    A     B 
 86.3 986.0

Leave a Comment