Not my own technique I saw it on the boards a while back:
dat <- read.table(text = "id taxa length width
101 collembola 2.1 0.9
102 mite 0.9 0.7
103 mite 1.1 0.8
104 collembola NA NA
105 collembola 1.5 0.5
106 mite NA NA", header=TRUE)
library(plyr)
impute.mean <- function(x) replace(x, is.na(x), mean(x, na.rm = TRUE))
dat2 <- ddply(dat, ~ taxa, transform, length = impute.mean(length),
width = impute.mean(width))
dat2[order(dat2$id), ] #plyr orders by group so we have to reorder
Edit A non plyr approach with a for
loop:
for (i in which(sapply(dat, is.numeric))) {
for (j in which(is.na(dat[, i]))) {
dat[j, i] <- mean(dat[dat[, "taxa"] == dat[j, "taxa"], i], na.rm = TRUE)
}
}
Edit many moons later here is a data.table & dplyr approach:
data.table
library(data.table)
setDT(dat)
dat[, length := impute.mean(length), by = taxa][,
width := impute.mean(width), by = taxa]
dplyr
library(dplyr)
dat %>%
group_by(taxa) %>%
mutate(
length = impute.mean(length),
width = impute.mean(width)
)