ItextSharp Error on trying to parse html for pdf conversion

`HTMLWorker’ has been deprecated in favor of XMLWorker. Here is a working example tested with a snippet of HTML like you used above:

StringReader html = new StringReader(@"
<div style="font-size: 18pt; font-weight: bold;">
Mouser Electronics <br />Authorized Distributor</div><br /> <br />
<div style="font-size: 14pt;">Click to View Pricing, Inventory, Delivery & Lifecycle Information:
</div>
<br />
<div>
<table>
<tr><td></td><td>
<a href="http://www.mouser.com/access/?pn=78211-009" 
style="color: Blue; font-size: 10pt; text-decoration: underline;">78211-009</a></td></tr>
</table></div>    
");      
using (Document document = new Document()) {
  PdfWriter writer = PdfWriter.GetInstance(document, STREAM);
  document.Open();
  XMLWorkerHelper.GetInstance().ParseXHtml(
    writer, document, html
  );
}

When using XMLWorker you need to use well-formed HTML – it’s an XML parser, after all. The sample HTML from your question above doesn’t have closing <a> or <br> tags. A HTML parser like HtmlAgilityPack will fix those problems, and turn this:

<div><img src="https://stackoverflow.com/questions/12113425/a.gif"><br><hr></div>

into this:

<div><img src="https://stackoverflow.com/questions/12113425/a.gif" /><br /><hr /></div>

with only a few lines of code:

var hDocument = new HtmlDocument()
{
    OptionWriteEmptyNodes = true,
    OptionAutoCloseOnEnd = true
};
hDocument.LoadHtml("<div><img src="https://stackoverflow.com/questions/12113425/a.gif"><br><hr></div>");
var closedTags  = hDocument.DocumentNode.WriteTo();

XMLWorker is available as a nuget package, or as a separate download at sourceforge.

See here for more advanced usage of XMLWorker.

Leave a Comment