How to efficiently use Rprof in R?

Alert readers of yesterdays breaking news (R 3.0.0 is finally out) may have noticed something interesting that is directly relevant to this question:

  • Profiling via Rprof() now optionally records information at the statement level, not just the function level.

And indeed, this new feature answers my question and I will show how.


Let’s say, we want to compare whether vectorizing and pre-allocating are really better than good old for-loops and incremental building of data in calculating a summary statistic such as the mean. The, relatively stupid, code is the following:

# create big data frame:
n <- 1000
x <- data.frame(group = sample(letters[1:4], n, replace=TRUE), condition = sample(LETTERS[1:10], n, replace = TRUE), data = rnorm(n))

# reasonable operations:
marginal.means.1 <- aggregate(data ~ group + condition, data = x, FUN=mean)

# unreasonable operations:
marginal.means.2 <- marginal.means.1[NULL,]

row.counter <- 1
for (condition in levels(x$condition)) {
  for (group in levels(x$group)) {  
    tmp.value <- 0
    tmp.length <- 0
    for (c in 1:nrow(x)) {
      if ((x[c,"group"] == group) & (x[c,"condition"] == condition)) {
        tmp.value <- tmp.value + x[c,"data"]
        tmp.length <- tmp.length + 1
      }
    }
    marginal.means.2[row.counter,"group"] <- group 
    marginal.means.2[row.counter,"condition"] <- condition
    marginal.means.2[row.counter,"data"] <- tmp.value / tmp.length
    row.counter <- row.counter + 1
  }
}

# does it produce the same results?
all.equal(marginal.means.1, marginal.means.2)

To use this code with Rprof, we need to parse it. That is, it needs to be saved in a file and then called from there. Hence, I uploaded it to pastebin, but it works exactly the same with local files.

Now, we

  • simply create a profile file and indicate that we want to save the line number,
  • source the code with the incredible combination eval(parse(..., keep.source = TRUE)) (seemingly the infamous fortune(106) does not apply here, as I haven’t found another way)
  • stop the profiling and indicate that we want the output based on the line numbers.

The code is:

Rprof("profile1.out", line.profiling=TRUE)
eval(parse(file = "http://pastebin.com/download.php?i=KjdkSVZq", keep.source=TRUE))
Rprof(NULL)

summaryRprof("profile1.out", lines = "show")

Which gives:

$by.self
                           self.time self.pct total.time total.pct
download.php?i=KjdkSVZq#17      8.04    64.11       8.04     64.11
<no location>                   4.38    34.93       4.38     34.93
download.php?i=KjdkSVZq#16      0.06     0.48       0.06      0.48
download.php?i=KjdkSVZq#18      0.02     0.16       0.02      0.16
download.php?i=KjdkSVZq#23      0.02     0.16       0.02      0.16
download.php?i=KjdkSVZq#6       0.02     0.16       0.02      0.16

$by.total
                           total.time total.pct self.time self.pct
download.php?i=KjdkSVZq#17       8.04     64.11      8.04    64.11
<no location>                    4.38     34.93      4.38    34.93
download.php?i=KjdkSVZq#16       0.06      0.48      0.06     0.48
download.php?i=KjdkSVZq#18       0.02      0.16      0.02     0.16
download.php?i=KjdkSVZq#23       0.02      0.16      0.02     0.16
download.php?i=KjdkSVZq#6        0.02      0.16      0.02     0.16

$by.line
                           self.time self.pct total.time total.pct
<no location>                   4.38    34.93       4.38     34.93
download.php?i=KjdkSVZq#6       0.02     0.16       0.02      0.16
download.php?i=KjdkSVZq#16      0.06     0.48       0.06      0.48
download.php?i=KjdkSVZq#17      8.04    64.11       8.04     64.11
download.php?i=KjdkSVZq#18      0.02     0.16       0.02      0.16
download.php?i=KjdkSVZq#23      0.02     0.16       0.02      0.16

$sample.interval
[1] 0.02

$sampling.time
[1] 12.54

Checking the source code tells us that the problematic line (#17) is indeed the stupid if-statement in the for-loop. Compared with basically no time for calculating the same using vectorized code (line #6).

I haven’t tried it with any graphical output, but I am already very impressed by what I got so far.

Leave a Comment