How to scrape tables inside a comment tag in html with R?

You can use the XPath comment() function to select comment nodes, then reparse their contents as HTML: library(rvest) # scrape page h <- read_html(‘http://www.basketball-reference.com/teams/CHI/2015.html’) df <- h %>% html_nodes(xpath=”//comment()”) %>% # select comment nodes html_text() %>% # extract comment text paste(collapse=””) %>% # collapse to a single string read_html() %>% # reparse to HTML html_node(‘table#advanced’) … Read more

Parse Web Site HTML with JAVA [duplicate]

There is a much easier way to do this. I suggest using JSoup. With JSoup you can do things like Document doc = Jsoup.connect(“http://en.wikipedia.org/”).get(); Elements newsHeadlines = doc.select(“#mp-itn b a”); Or if you want the body: Elements body = doc.select(“body”); Or if you want all links: Elements links = doc.select(“body a”); You no longer need … Read more