Find start and end positions/indices of runs/consecutive values

Core logic:

# Example vector and rle object
x = rev(rep(6:10, 1:5))
rle_x = rle(x)

# Compute endpoints of run
end = cumsum(rle_x$lengths)
start = c(1, lag(end)[-1] + 1)

# Display results
data.frame(start, end)
#   start end
# 1     1   5
# 2     6   9
# 3    10  12
# 4    13  14
# 5    15  15

Tidyverse/dplyr way (data frame-centric):

library(dplyr)

rle(x) %>%
  unclass() %>%
  as.data.frame() %>%
  mutate(end = cumsum(lengths),
         start = c(1, dplyr::lag(end)[-1] + 1)) %>%
  magrittr::extract(c(1,2,4,3)) # To re-order start before end for display

Because the start and end vectors are the same length as the values component of the rle object, solving the related problem of identifying endpoints for runs meeting some condition is straightforward: filter or subset the start and end vectors using the condition on the run values.

Leave a Comment