Take Sum of a Variable if Combination of Values in Two Other Columns are Unique [duplicate]

We could either use the base R method by first sorting the first two columns by row. We use apply with MARGIN=1 to do that, transpose the output, convert to ‘data.frame’ to create ‘df1’, use the formula method of aggregate to get the sum of ‘num_email’ grouped by the first two columns of the transformed dataset.

df1 <- data.frame(t(apply(df[1:2], 1, sort)), df[3])
aggregate(num_email~., df1, FUN=sum)

#      X1    X2 num_email
# 1  Beth Mable         2
# 2  Beth Susan         3
# 3 Mable Susan         1

Or using data.table, we convert the first two columns to character class, unname to change the column names of the first two columns to the default ‘V1’, ‘V2’, and convert to ‘data.table’. Using the lexicographic ordering of character columns, we create the logical index for i (V1 > V2), assign (:=) the columns that meet the condition by reversing the order of columns (.(V2, V1)), and get the sum of ‘num_email’ grouped by ‘V1’, ‘V2’.

library(data.table)
dt = do.call(data.table, c(lapply(unname(df[1:2]), as.character), df[3]))
dt[V1 > V2, c("V1", "V2") := .(V2, V1)]
dt[, .(num_email = sum(num_email)), by= .(V1, V2)]

#       V1    V2 num_email
# 1:  Beth Mable         2
# 2:  Beth Susan         3
# 3: Mable Susan         1

Or using dplyr, we use mutate_each to convert the columns to character class, then reverse the order with pmin and pmax, group by ‘V1’, ‘V2’ and get the sum of ‘num_email’.

library(dplyr)
df %>%
  mutate_each(funs(as.character), senders, receivers) %>%
  mutate( V1 = pmin(senders, receivers), 
          V2 = pmax(senders, receivers) ) %>%
  group_by(V1, V2) %>%
  summarise(num_email=sum(num_email))

#      V1    V2 num_email
#   (chr) (chr)     (dbl)
# 1  Beth Mable         2
# 2  Beth Susan         3
# 3 Mable Susan         1

NOTE: The data.table solution was updated by @Frank.

Leave a Comment