Extracting div's InnerHtml?

Have you tried the HtmlAgilityPack? It will allow you to parse and query (with XPATH) a lot of the malformed HTML you find.

If I’m understanding your problem correctly, you might use:

HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load("http://abc.com/xyz.html");

HtmlAgilityPack.HtmlNode div = doc.DocumentNode
    .SelectSingleNode("/html/body/div[@class=\"os-box unround\"]");
string contentYouWantedToDisplayOnYourOwnPage = div.InnerHtml;

Leave a Comment