How to merge multiple pdf files (generated in run time)?

If you want to merge source documents using iText(Sharp), there are two basic situations:

  1. You really want to merge the documents, acquiring the pages in their original format, transfering as much of their content and their interactive annotations as possible. In this case you should use a solution based on a member of the Pdf*Copy* family of classes.

  2. You actually want to integrate pages from the source documents into a new document but want the new document to govern the general format and don’t care for the interactive features (annotations…) in the original documents (or even want to get rid of them). In this case you should use a solution based on the PdfWriter class.

You can find details in chapter 6 (especially section 6.4) of iText in Action — 2nd Edition. The Java sample code can be accessed here and the C#’ified versions here.

A simple sample using PdfCopy is Concatenate.java / Concatenate.cs. The central piece of code is:

byte[] mergedPdf = null;
using (MemoryStream ms = new MemoryStream())
{
    using (Document document = new Document())
    {
        using (PdfCopy copy = new PdfCopy(document, ms))
        {
            document.Open();

            for (int i = 0; i < pdf.Count; ++i)
            {
                PdfReader reader = new PdfReader(pdf[i]);
                // loop over the pages in that document
                int n = reader.NumberOfPages;
                for (int page = 0; page < n; )
                {
                    copy.AddPage(copy.GetImportedPage(reader, ++page));
                }
            }
        }
    }
    mergedPdf = ms.ToArray();
}

Here pdf can either be defined as a List<byte[]> immediately containing the source documents (appropriate for your use case of merging intermediate in-memory documents) or as a List<String> containing the names of source document files (appropriate if you merge documents from disk).

An overview at the end of the referenced chapter summarizes the usage of the classes mentioned:

  • PdfCopy: Copies pages from one or more existing PDF documents. Major downsides: PdfCopy doesn’t detect redundant content, and it fails when concatenating forms.

  • PdfCopyFields: Puts the fields of the different forms into one form. Can be used to avoid the problems encountered with form fields when concatenating forms using PdfCopy. Memory use can be an issue.

  • PdfSmartCopy: Copies pages from one or more existing PDF documents. PdfSmartCopy is able to detect redundant content, but it needs more memory and CPU than PdfCopy.

  • PdfWriter: Generates PDF documents from scratch. Can import pages from other PDF documents. The major downside is that all interactive features of the imported page (annotations, bookmarks, fields, and so forth) are lost in the process.

Leave a Comment