Sum by two variables

In base R (for sum) there’s xtabs:

> xtabs(sales ~ Date + area, mydf)
        area
Date     beijing shanghai
  201204      41       23
  201205      17       71

To get it as a data.frame, wrap it in as.data.frame.matrix.


To update this with the approach that is making the rounds these days, you can also use a combination of “dplyr” (for aggregation) and “tidyr” (for reshaping), like this:

library(tidyr)
library(dplyr)
mydf %>% 
  group_by(Date, area) %>% 
  summarise(sales = sum(sales)) %>% 
  spread(area, sales)
# Source: local data frame [2 x 3]
# 
#     Date beijing shanghai
# 1 201204      41       23
# 2 201205      17       71

Leave a Comment