First of all, it is an already long debunked myth that for
loops are any slower than lapply
. The for
loops in R have been made a lot more performant and are currently at least as fast as lapply
.
That said, you have to rethink your use of lapply
here. Your implementation demands assigning to the global environment, because your code requires you to update the weight during the loop. And that is a valid reason to not consider lapply
.
lapply
is a function you should use for its side effects (or lack of side effects). The function lapply
combines the results in a list automatically and doesn’t mess with the environment you work in, contrary to a for
loop. The same goes for replicate
. See also this question:
Is R’s apply family more than syntactic sugar?
The reason your lapply
solution is far slower, is because your way of using it creates a lot more overhead.
replicate
is nothing else butsapply
internally, so you actually combinesapply
andlapply
to implement your double loop.sapply
creates extra overhead because it has to test whether or not the result can be simplified. So afor
loop will be actually faster than usingreplicate
.- inside your
lapply
anonymous function, you have to access the dataframe for both x and y for every observation. This means that -contrary to in your for-loop- eg the function$
has to be called every time. - Because you use these high-end functions, your ‘lapply’ solution calls 49 functions, compared to your
for
solution that only calls 26. These extra functions for thelapply
solution include calls to functions likematch
,structure
,[[
,names
,%in%
,sys.call
,duplicated
, …
All functions not needed by yourfor
loop as that one doesn’t do any of these checks.
If you want to see where this extra overhead comes from, look at the internal code of replicate
, unlist
, sapply
and simplify2array
.
You can use the following code to get a better idea of where you lose your performance with the lapply
. Run this line by line!
Rprof(interval = 0.0001)
f()
Rprof(NULL)
fprof <- summaryRprof()$by.self
Rprof(interval = 0.0001)
perceptron(as.matrix(irissubdf[1:2]), irissubdf$y, 1, 10)
Rprof(NULL)
perprof <- summaryRprof()$by.self
fprof$Fun <- rownames(fprof)
perprof$Fun <- rownames(perprof)
Selftime <- merge(fprof, perprof,
all = TRUE,
by = 'Fun',
suffixes = c(".lapply",".for"))
sum(!is.na(Selftime$self.time.lapply))
sum(!is.na(Selftime$self.time.for))
Selftime[order(Selftime$self.time.lapply, decreasing = TRUE),
c("Fun","self.time.lapply","self.time.for")]
Selftime[is.na(Selftime$self.time.for),]