Converting a character string into a date in R

Updated: Improved with @Richard Scriven‘s colClasses and simpler as.Date() suggestions

Here are two similar methods that worked for me, going from a csv containing mmddyyyy format date, to getting it recognized by R as a date object.

Starting first with a simple file tv.csv:

Series,FirstAir
Quantico,09272015
Muppets,09222015

Method 1: All as string

Once within R,

> t = read.csv('tv.csv', colClasses="character")
  • imports tv.csv as a data frame named t
  • colClasses="character") option causes all the data to be considered the character data type (instead of being Factor, int types)

Examine its initial structure:

> str(t)
'data.frame':   2 obs. of  2 variables:
 $ Series  : chr  "Quantico" "Muppets"
 $ FirstAir: chr  "09272015" "09222015"
  • R has imported all as strings of characters, indicated here as type chr

The chr or string of characters are then easily converted into a date:

> t$FirstAir = as.Date(t$FirstAir, "%m%d%Y")
  • as.Date() performs string to date conversion
  • %m%d%Y specifies how to interpret the input in t$FirstAir. These format codes, at least on Linux, can be found with running $ man date which brings up the manual on the date program, where there is a list of formatting codes. For example it says %m month (01..12)

Method 2: Import then fix only the date

If for some reason you don’t want a blanket import conversion to all characters, for example a file with many variables and wish to leave R’s auto type recognition in use but merely “fix” the one date variable, follow this method.

Once within R,

> t = read.csv('tv.csv')
  • imports tv.csv as a data frame named t

Examine its initial structure:

> str(t)
'data.frame':   2 obs. of  2 variables:
 $ Series  : Factor w/ 2 levels "Muppets","Quantico": 2 1
 $ FirstAir: int  9272015 9222015
>
  • R tries its best to guess the variable type per variable
  • As you can see an immediate problem is, for FirstAir variable R has imported 09272015 as int meaning integer, and dropped off the leading zero padding , the 0 in 09 is important later for date conversion yet R has imported it without. So we need to fix this.

This can be done in a single command but for clarity I have broken this into two steps. First,

> t$FirstAir = sprintf("%08d", t$FirstAir)
  • sprintf is a formatting function
  • 0 means pad with zeroes
  • 8 means ensure 8 characters, because mmddyyyy is total 8 characters
  • d is used when the input is a number, which currently it is, recall str() output claimed the t$FirstAir is an int meaning integer
  • t$FirstAir is the variable we are both setting and using as input

Check the result:

> str(t$FirstAir)
 chr [1:2] "09272015" "09222015"
  • it successfully converted from an int to a chr type, for example 9272015 became "09272015"

Now it is a string or chr type we can then convert, same as method 1.

> t$FirstAir = as.Date(strptime(t$FirstAir, "%m%d%Y"))

Result

We do a final check:

> str(t$FirstAir)
 Date[1:2], format: "2015-09-27" "2015-09-22"

In both cases, what were original values in a text file are have now been successfully converted into R date objects.

Leave a Comment