HTML Agility pack – parsing tables

How about something like: Using HTML Agility Pack HtmlDocument doc = new HtmlDocument(); doc.LoadHtml(@”<html><body><p><table id=””foo””><tr><th>hello</th></tr><tr><td>world</td></tr></table></body></html>”); foreach (HtmlNode table in doc.DocumentNode.SelectNodes(“//table”)) { Console.WriteLine(“Found: ” + table.Id); foreach (HtmlNode row in table.SelectNodes(“tr”)) { Console.WriteLine(“row”); foreach (HtmlNode cell in row.SelectNodes(“th|td”)) { Console.WriteLine(“cell: ” + cell.InnerText); } } } Note that you can make it prettier with LINQ-to-Objects if … Read more

How to use HTML Agility pack

First, install the HTMLAgilityPack nuget package into your project. Then, as an example: HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument(); // There are various options, set as needed htmlDoc.OptionFixNestedTags=true; // filePath is a path to a file containing the html htmlDoc.Load(filePath); // Use: htmlDoc.LoadHtml(xmlString); to load from a string (was htmlDoc.LoadXML(xmlString) // ParseErrors is an ArrayList containing … Read more