identify groups of linked episodes which chain together

The Bioconductor package RBGL (an R interface to the BOOST graph library) contains
a function, connectedComp(), which identifies the connected components in a graph —
just what you are wanting.

(To use the function, you will first need to install the graph and RBGL packages, available here and here.)

library(RBGL)
test <- data.frame(id1=c(10,10,1,1,24,8),id2=c(1,36,24,45,300,11))

## Convert your 'from-to' data to a 'node and edge-list' representation  
## used by the 'graph' & 'RBGL' packages 
g <- ftM2graphNEL(as.matrix(test))

## Extract the connected components
cc <- connectedComp(g)

## Massage results into the format you're after 
ld <- lapply(seq_along(cc), 
             function(i) data.frame(group = names(cc)[i], id = cc[[i]]))
do.call(rbind, ld)
#   group  id
# 1     1  10
# 2     1   1
# 3     1  24
# 4     1  36
# 5     1  45
# 6     1 300
# 7     2   8
# 8     2  11

Leave a Comment