Scrape password-protected website in R

You can use RSelenium. I have used the dev version as you can run phantomjs without a Selenium Server.

# Install RSelenium if required. You will need phantomjs in your path or follow instructions
# in package vignettes
# devtools::install_github("ropensci/RSelenium")
# login first
appURL <- 'http://subscribers.footballguys.com/amember/login.php'
library(RSelenium)
pJS <- phantom() # start phantomjs
remDr <- remoteDriver(browserName = "phantomjs")
remDr$open()
remDr$navigate(appURL)
remDr$findElement("id", "login")$sendKeysToElement(list("myusername"))
remDr$findElement("id", "pass")$sendKeysToElement(list("mypass"))
remDr$findElement("css", ".am-login-form input[type="submit"]")$clickElement()

appURL <- 'http://subscribers.footballguys.com/myfbg/myviewprojections.php?projector=2'
remDr$navigate(appURL)
tableElem<- remDr$findElement("css", "table.datamedium")
res <- readHTMLTable(header = TRUE, tableElem$getElementAttribute("outerHTML")[[1]])
> res[[1]][1:5, ]
Rank             Name Tm/Bye Age Exp Cmp Att  Cm%  PYd Y/Att PTD Int Rsh  Yd TD FantPt
1    1   Peyton Manning  DEN/4  38  17 415 620 66.9 4929  7.95  43  12  24   7  0 407.15
2    2       Drew Brees   NO/6  35  14 404 615 65.7 4859  7.90  37  16  22  44  1 385.35
3    3    Aaron Rodgers   GB/9  31  10 364 560 65.0 4446  7.94  33  13  52 224  3 381.70
4    4      Andrew Luck IND/10  25   3 366 610 60.0 4423  7.25  27  13  62 338  2 361.95
5    5 Matthew Stafford  DET/9  26   6 377 643 58.6 4668  7.26  32  19  34 102  1 358.60

Finally when you are finished close phantomjs

pJS$stop()

If you want to use a traditional browser like firefox for example (if you wanted to stick to the version on CRAN) you would use:

RSelenium::startServer()
remDr <- remoteDriver()
........
........
remDr$closeServer()

in place of the related phantomjs calls.

Leave a Comment