calculate the time interval in different year

First off, two comments:

  1. Your dput throws errors. Firstly, you are missing single ticks for the date column names; secondly there seems to be an error for OUT_TIME_date of CASEID = 101571(5). For future posts, please double-check that sample data (1) doesn’t contain R syntax errors, and (2) is correct & representative.
  2. There are also mistakes in your expected output, where you don’t include the last day for stays that extend past one year.

Here is one option making use of lubridate functions to deal with Dates

library(tidyverse)
library(lubridate)
df %>%
    mutate_at(vars(ends_with("date")), ~as.Date(.x, format = "%Y-%m-%d")) %>%
    mutate(
        LOS = difftime(OUT_TIME_date, IN_TIME_date, units = "days"),
        grp = map2(
            IN_TIME_date,
            OUT_TIME_date,
            ~year(seq.Date(.x, .y - 1, by = "day")))) %>%
    unnest() %>%
    group_by_all() %>%
    tally() %>%
    spread(grp, n, fill = 0) %>%
    ungroup() %>%
    as.data.frame()
#        CASEID IN_TIME_date OUT_TIME_date      LOS 2012 2013 2014 2015 2016
#1    023252(1)   2017-02-25    2017-03-02   5 days    0    0    0    0    0
#2  07597558(2)   2015-10-23    2015-12-05  43 days    0    0    0   43    0
#3  07597558(3)   2016-07-06    2016-07-15   9 days    0    0    0    0    9
#4   100520(31)   2013-01-23    2013-01-28   5 days    0    5    0    0    0
#5   100520(32)   2013-03-12    2013-03-18   6 days    0    6    0    0    0
#6   100520(33)   2013-06-13    2013-06-18   5 days    0    5    0    0    0
#7   100520(34)   2013-10-08    2013-10-15   7 days    0    7    0    0    0
#8     10056(1)   2016-02-20    2016-02-29   9 days    0    0    0    0    9
#9    101171(4)   2015-09-24    2015-10-19  25 days    0    0    0   25    0
#10   101171(5)   2015-10-19    2015-11-02  14 days    0    0    0   14    0
#11   101455(2)   2014-05-01    2014-05-28  27 days    0    0   27    0    0
#12   101557(2)   2015-12-11    2016-01-15  35 days    0    0    0   21   14
#13   101571(3)   2014-08-26    2015-07-21 329 days    0    0  128  201    0
#14   101571(4)   2015-07-21    2016-01-06 169 days    0    0    0  164    5
#15   101571(5)   2016-01-06    2018-03-20 804 days    0    0    0    0  361
#16   101571(6)   2017-03-20    2017-12-14 269 days    0    0    0    0    0
#17    10160(5)   2014-04-14    2014-06-14  61 days    0    0   61    0    0
#18   101637(2)   2017-04-25    2017-05-09  14 days    0    0    0    0    0
#19  101893(13)   2014-08-10    2014-08-21  11 days    0    0   11    0    0
#20  101893(15)   2017-02-06    2017-02-11   5 days    0    0    0    0    0
#21  101893(16)   2017-04-12    2017-04-20   8 days    0    0    0    0    0
#22   102807(4)   2016-01-19    2016-01-24   5 days    0    0    0    0    5
#23   102807(5)   2016-06-08    2016-06-15   7 days    0    0    0    0    7
#24  102862(12)   2012-10-19    2013-01-25  98 days   74   24    0    0    0
#   2017 2018
#1     5    0
#2     0    0
#3     0    0
#4     0    0
#5     0    0
#6     0    0
#7     0    0
#8     0    0
#9     0    0
#10    0    0
#11    0    0
#12    0    0
#13    0    0
#14    0    0
#15  365   78
#16  269    0
#17    0    0
#18   14    0
#19    0    0
#20    5    0
#21    8    0
#22    0    0
#23    0    0
#24    0    0

The idea is to generate a day-by-day sequence of dates based on IN_TIME_date and OUT_TIME_date; we then extract only the years from those sequences and tally the number of years per CASEID. The rest is basic tidying/reshaping of your data.


df <- structure(list(CASEID = c("023252(1)", "07597558(2)", "07597558(3)",
"100520(31)", "100520(32)", "100520(33)", "100520(34)", "10056(1)",
"101171(4)", "101171(5)", "101455(2)", "101557(2)", "101571(3)",
"101571(4)", "101571(5)", "101571(6)", "10160(5)", "101637(2)",
"101893(13)", "101893(15)", "101893(16)", "102807(4)", "102807(5)",
"102862(12)"), IN_TIME_date = c("2017-02-25", "2015-10-23", "2016-07-06",
"2013-01-23", "2013-03-12", "2013-06-13", "2013-10-08", "2016-02-20",
"2015-09-24", "2015-10-19", "2014-05-01", "2015-12-11", "2014-08-26",
"2015-07-21", "2016-01-06", "2017-03-20", "2014-04-14", "2017-04-25",
"2014-08-10", "2017-02-06", "2017-04-12", "2016-01-19", "2016-06-08",
"2012-10-19"), OUT_TIME_date = c("2017-03-02", "2015-12-05",
"2016-07-15", "2013-01-28", "2013-03-18", "2013-06-18", "2013-10-15",
"2016-02-29", "2015-10-19", "2015-11-02", "2014-05-28", "2016-01-15",
"2015-07-21", "2016-01-06", "2018-03-20", "2017-12-14", "2014-06-14",
"2017-05-09", "2014-08-21", "2017-02-11", "2017-04-20", "2016-01-24",
"2016-06-15", "2013-01-25")), row.names = c(NA, -24L), class = "data.frame")

Leave a Comment