A small, reproducible example:
library(zoo)
set.seed(1)
m <- matrix(runif(16, 0, 100), nrow = 4)
missing_values <- sample(16, 7)
m[missing_values] <- NA
m
[,1] [,2] [,3] [,4]
[1,] 26.55087 20.16819 62.911404 68.70228
[2,] 37.21239 NA 6.178627 38.41037
[3,] NA NA NA NA
[4,] 90.82078 66.07978 NA NA
na.approx(m)
[,1] [,2] [,3] [,4]
[1,] 26.55087 20.16819 62.911404 68.70228
[2,] 37.21239 35.47206 6.178627 38.41037
[3,] 64.01658 50.77592 NA NA
[4,] 90.82078 66.07978 NA NA
m[4, 4] <- 50
na.approx(m)
[,1] [,2] [,3] [,4]
[1,] 26.55087 20.16819 62.911404 68.70228
[2,] 37.21239 35.47206 6.178627 38.41037
[3,] 64.01658 50.77592 NA 44.20519
[4,] 90.82078 66.07978 NA 50.00000
Yup, looks like you do need the start/end values of columns to be known or the interpolation doesn’t work. Can you guess values for your boundaries?
ANOTHER EDIT: So by default, you need the start and end values of columns to be known. However it is possible to get na.approx
to always fill in the blanks by passing rule = 2
. See Felix’s answer. You can also use na.fill
to provide a default value, as per Gabor’s comment. Finally, you can interpolate boundary conditions in two directions (see below) or guess boundary conditions.
EDIT: A further thought. Since na.approx
is only interpolating in columns, and your data is spacial, perhaps interpolating in rows would be useful too. Then you could take the average.
na.approx
fails when whole columns are NA
, so we create a bigger dataset.
set.seed(1)
m <- matrix(runif(64, 0, 100), nrow = 8)
missing_values <- sample(64, 15)
m[missing_values] <- NA
Run na.approx
both ways.
by_col <- na.approx(m)
by_row <- t(na.approx(t(m)))
Find out the best guess.
default <- 50
best_guess <- ifelse(is.na(by_row),
ifelse(
is.na(by_col),
default, #neither known
by_col #only by_col known
),
ifelse(
is.na(by_col),
by_row, #only by_row known
(by_row + by_col) / 2 #both known
)
)