Cumulatively paste (concatenate) values grouped by another variable

You could define a “cumulative paste” function using Reduce:

cumpaste = function(x, .sep = " ") 
          Reduce(function(x1, x2) paste(x1, x2, sep = .sep), x, accumulate = TRUE)

cumpaste(letters[1:3], "; ")
#[1] "a"       "a; b"    "a; b; c"

Reduce‘s loop avoids re-concatenating elements from the start as it elongates the previous concatenation by the next element.

Applying it by group:

ave(as.character(testdf$content), testdf$id, FUN = cumpaste)
#[1] "A"     "A B"   "A B A" "B"     "B C"   "B C B"

Another idea, could to concatenate the whole vector at start and, then, progressively substring:

cumpaste2 = function(x, .sep = " ")
{
    concat = paste(x, collapse = .sep)
    substring(concat, 1L, cumsum(c(nchar(x[[1L]]), nchar(x[-1L]) + nchar(.sep))))
}
cumpaste2(letters[1:3], " ;@-")
#[1] "a"           "a ;@-b"      "a ;@-b ;@-c"

This seems to be somewhat faster, too:

set.seed(077)
X = replicate(1e3, paste(sample(letters, sample(0:5, 1), TRUE), collapse = ""))
identical(cumpaste(X, " --- "), cumpaste2(X, " --- "))
#[1] TRUE
microbenchmark::microbenchmark(cumpaste(X, " --- "), cumpaste2(X, " --- "), times = 30)
#Unit: milliseconds
#                  expr      min       lq     mean   median       uq      max neval cld
#  cumpaste(X, " --- ") 21.19967 21.82295 26.47899 24.83196 30.34068 39.86275    30   b
# cumpaste2(X, " --- ") 14.41291 14.92378 16.87865 16.03339 18.56703 23.22958    30  a

…which makes it the cumpaste_faster.

Leave a Comment