efficiently generate a random sample of times and dates between two dates

Ahh, another date/time problem we can reduce to working in floats 🙂

Try this function

R> latemail <- function(N, st="2012/01/01", et="2012/12/31") {
+     st <- as.POSIXct(as.Date(st))
+     et <- as.POSIXct(as.Date(et))
+     dt <- as.numeric(difftime(et,st,unit="sec"))
+     ev <- sort(runif(N, 0, dt))
+     rt <- st + ev
+ }
R>

We compute the difftime in seconds, and then “merely” draw uniforms over it, sorting the result. Add that to the start and you’re done:

R> set.seed(42); print(latemail(5))     ## round to date, or hour, or ...
[1] "2012-04-14 05:34:56.369022 CDT" "2012-08-22 00:41:26.683809 CDT" 
[3] "2012-10-29 21:43:16.335659 CDT" "2012-11-29 15:42:03.387701 CST"
[5] "2012-12-07 18:46:50.233761 CST"
R> system.time(latemail(100000))
   user  system elapsed 
  0.024   0.000   0.021 
R> system.time(latemail(200000))
   user  system elapsed 
  0.044   0.000   0.045 
R> system.time(latemail(10000000))   ## a few more than in your example :)
   user  system elapsed 
  3.240   0.172   3.428 
R> 

Leave a Comment