Add HTML String to OpenXML (*.docx) Document

I can reproduce the error “… there is a problem with the content” by using
an incomplete HTML document as the content of the alternative format import part.
For example if you use the following HTML snippet <h1>HELLO</h1>
MS Word is unable to open the document.

The code below shows how to add an AlternativeFormatImportPart to a word document.
(I’ve tested the code with MS Word 2013).

using (WordprocessingDocument doc = WordprocessingDocument.Open(@"test.docx", true))
{
  string altChunkId = "myId";
  MainDocumentPart mainDocPart = doc.MainDocumentPart;

  var run = new Run(new Text("test"));
  var p = new Paragraph(new ParagraphProperties(
       new Justification() { Val = JustificationValues.Center }),
                     run);

  var body = mainDocPart.Document.Body;
  body.Append(p);        

  MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes("<html><head></head><body><h1>HELLO</h1></body></html>"));

  // Uncomment the following line to create an invalid word document.
  // MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes("<h1>HELLO</h1>"));

  // Create alternative format import part.
  AlternativeFormatImportPart formatImportPart =
     mainDocPart.AddAlternativeFormatImportPart(
        AlternativeFormatImportPartType.Html, altChunkId);
  //ms.Seek(0, SeekOrigin.Begin);

  // Feed HTML data into format import part (chunk).
  formatImportPart.FeedData(ms);
  AltChunk altChunk = new AltChunk();
  altChunk.Id = altChunkId;

  mainDocPart.Document.Body.Append(altChunk);
}

According to the Office OpenXML specification valid parent elements for the
w:altChunk element are body, comment, docPartBody, endnote, footnote, ftr, hdr and tc.
So, I’ve added the w:altChunk to the body element.

For more information on the w:altChunk element see this MSDN link.

EDIT

As pointed out by @user2945722, to make sure that the OpenXml library correctlty interprets the byte array as UTF-8, you should add the UTF-8 preamble. This can be done this way:

MemoryStream ms = new MemoryStream(new UTF8Encoding(true).GetPreamble().Concat(Encoding.UTF8.GetBytes(htmlEncodedString)).ToArray()

This will prevent your é’s from being rendered as é’s, your ä’s as ä’s, etc.

Leave a Comment