how to use loop to subset 1000 csv files and do co-integration test in R?

I’ve got a two step solution for you:

x <- list.files(path = "your directory", pattern = ".csv")
y <- lapply(x,fread)
data <- rbindlist(y)

Reads in all the csvs in your directory as data tables then binds then together.

After that i’d just subset as follows:

data <- data[Date >= "your dates",]

EDIT*

I get the feeling you want to bring in your data and bind it column wise/merge. As it stands that would be incorrect as you have the same “C” value in each column. After looking at your samples, if you were to stack those row wise you’d just be getting a very long C column. I wonder whether each C column represents the same or a different variable. If it is a different variable, I’ve written up some code which would truncate your data appropriately.

I’ve used the first 6 rows of your sample data 1 and sample data 2

    files <- list.files(path = dir, pattern = ".csv")

data_mock <- lapply(files,fread)

data_mock[[1]][, Date := data_mock[[2]][,Date]] 
#I change the dates here because your sample dates are too far apart to test for date truncation to work

for (i in 1:length(data_mock)){
  data_mock[[i]]$Date <- as.Date(data_mock[[i]]$Date, format = "%Y-%m-%d")
}

for (i in 1:length(data_mock)){
  setnames(data_mock[[i]], old = names(data_mock[[i]]), new = c("Date", paste0("C",i)))
}
#I change the variable names here because I'm not sure whether you want to stack Cs ontop of one another or whether each C is a different variable.
#I've assumed each C is different.

start_finish <- function(data, start, finish){
  data[Date >= start & Date <= finish,]
}

results <- list()
for (i in 1:length(data_mock)){
results[[i]] <- start_finish(data_mock[[i]], "1987-01-15", "1987-01-17")
}

This is what the original data looked like:

    [[1]]
         Date      C
1: 1998-04-20 12.667
2: 1998-04-21 12.587
3: 1998-04-22 12.625
4: 1998-04-23 12.601
5: 1998-04-24 12.584
6: 1998-04-25 12.624

[[2]]
         Date     C
1: 1987-01-14 95.89
2: 1987-01-15 97.72
3: 1987-01-16 98.10
4: 1987-01-17 97.07
5: 1987-01-18 98.86
6: 1987-01-19 99.95

This is what it looks like once you run a loop over the “start_finish” function I wrote:

[[1]]
         Date      C
1: 1987-01-15 12.587
2: 1987-01-16 12.625
3: 1987-01-17 12.601

[[2]]
         Date    C2
1: 1987-01-15 97.72
2: 1987-01-16 98.10
3: 1987-01-17 97.07

I believe you wanted your data to start and end at the same time? You merely need to alter the “start” and “finish” dates in the pretty simple function I wrote.

Is that what you’re after?

Leave a Comment