Pulling data from a webpage, parsing it for specific pieces, and displaying it

This small example uses HtmlAgilityPack, and using XPath selectors to get to the desired elements.

protected void Page_Load(object sender, EventArgs e)
{
    string url = "http://www.metacritic.com/game/pc/halo-spartan-assault";
    var web = new HtmlAgilityPack.HtmlWeb();
    HtmlDocument doc = web.Load(url);

    string metascore = doc.DocumentNode.SelectNodes("//*[@id=\"main\"]/div[3]/div/div[2]/div[1]/div[1]/div/div/div[2]/a/span[1]")[0].InnerText;
    string userscore = doc.DocumentNode.SelectNodes("//*[@id=\"main\"]/div[3]/div/div[2]/div[1]/div[2]/div[1]/div/div[2]/a/span[1]")[0].InnerText;
    string summary = doc.DocumentNode.SelectNodes("//*[@id=\"main\"]/div[3]/div/div[2]/div[2]/div[1]/ul/li/span[2]/span/span[1]")[0].InnerText;
}

An easy way to obtain the XPath for a given element is by using your web browser (I use Chrome) Developer Tools:

Open the Developer Tools (F12 or Ctrl + Shift + C on Windows or Command + Shift + C for Mac).
Select the element in the page that you want the XPath for.
Right click the element in the “Elements” tab.
Click on “Copy as XPath”.

You can paste it exactly like that in c# (as shown in my code), but make sure to escape the quotes.

You have to make sure you use some error handling techniques because Web scraping can cause errors if they change the HTML formatting of the page.

Edit

Per @knocte’s suggestion, here is the link to the Nuget package for HTMLAgilityPack:

https://www.nuget.org/packages/HtmlAgilityPack/

More Related Contents:

Leave a Comment Cancel reply