Apply several summary functions (sum, mean, etc.) on several variables by group in one call

You can do it all in one step and get proper labeling:

> aggregate(. ~ id1+id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) )
#   id1 id2 val1.mn val1.n val2.mn val2.n
# 1   a   x     1.5    2.0     6.5    2.0
# 2   b   x     2.0    2.0     8.0    2.0
# 3   a   y     3.5    2.0     7.0    2.0
# 4   b   y     3.0    2.0     6.0    2.0

This creates a dataframe with two id columns and two matrix columns:

str( aggregate(. ~ id1+id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) ) )
'data.frame':   4 obs. of  4 variables:
 $ id1 : Factor w/ 2 levels "a","b": 1 2 1 2
 $ id2 : Factor w/ 2 levels "x","y": 1 1 2 2
 $ val1: num [1:4, 1:2] 1.5 2 3.5 3 2 2 2 2
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr  "mn" "n"
 $ val2: num [1:4, 1:2] 6.5 8 7 6 2 2 2 2
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr  "mn" "n"

As pointed out by @lord.garbage below, this can be converted to a dataframe with “simple” columns by using do.call(data.frame, ...)

str( do.call(data.frame, aggregate(. ~ id1+id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) ) ) 
    )
'data.frame':   4 obs. of  6 variables:
 $ id1    : Factor w/ 2 levels "a","b": 1 2 1 2
 $ id2    : Factor w/ 2 levels "x","y": 1 1 2 2
 $ val1.mn: num  1.5 2 3.5 3
 $ val1.n : num  2 2 2 2
 $ val2.mn: num  6.5 8 7 6
 $ val2.n : num  2 2 2 2

This is the syntax for multiple variables on the LHS:

aggregate(cbind(val1, val2) ~ id1 + id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) )

Leave a Comment