Parsing HTML to get script variable value

Very simple example of how this could be easy using a HTMLAgilityPack and Jurassic library to evaluate the result: var html = @”<html> // Some HTML <script> var spect = [[‘temper’, ‘init’, []], [‘fw\/lib’, ‘init’, [{staticRoot: ‘//site.com/js/’}]], [“”cap””,””dm””,[{“”tackmod””:””profile””,””xMod””:””timed””}]]]; </script> // More HTML </html>”; // Grab the content of the first script element HtmlAgilityPack.HtmlDocument doc = … Read more

ItextSharp Error on trying to parse html for pdf conversion

`HTMLWorker’ has been deprecated in favor of XMLWorker. Here is a working example tested with a snippet of HTML like you used above: StringReader html = new StringReader(@” <div style=”font-size: 18pt; font-weight: bold;”> Mouser Electronics <br />Authorized Distributor</div><br /> <br /> <div style=”font-size: 14pt;”>Click to View Pricing, Inventory, Delivery & Lifecycle Information: </div> <br /> … Read more

Parsing HTML Table in C#

Using Html Agility Pack WebClient webClient = new WebClient(); string page = webClient.DownloadString(“http://www.mufap.com.pk/payout-report.php?tab=01”); HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(page); List<List<string>> table = doc.DocumentNode.SelectSingleNode(“//table[@class=”mydata”]”) .Descendants(“tr”) .Skip(1) .Where(tr=>tr.Elements(“td”).Count()>1) .Select(tr => tr.Elements(“td”).Select(td => td.InnerText.Trim()).ToList()) .ToList();

How to get img/src or a/hrefs using Html Agility Pack?

The first example on the home page does something very similar, but consider: HtmlDocument doc = new HtmlDocument(); doc.Load(“file.htm”); // would need doc.LoadHtml(htmlSource) if it is not a file foreach(HtmlNode link in doc.DocumentElement.SelectNodes(“//a[@href”]) { string href = link[“href”].Value; // store href somewhere } So you can imagine that for img@src, just replace each a with … Read more

How to get all input elements in a form with HtmlAgilityPack without getting a null reference error

You can do the following: HtmlNode.ElementsFlags.Remove(“form”); HtmlDocument doc = new HtmlDocument(); doc.Load(@”D:\test.html”); HtmlNode secondForm = doc.GetElementbyId(“form2”); foreach (HtmlNode node in secondForm.Elements(“input”)) { HtmlAttribute valueAttribute = node.Attributes[“value”]; if (valueAttribute != null) { Console.WriteLine(valueAttribute.Value); } } By default HTML Agility Pack parses forms as empty node because they are allowed to overlap other HTML elements. The first … Read more