PDF Compression with iTextSharp [closed]

iText and iTextSharp have some methods for replacing indirect objects. Specifically there’s PdfReader.KillIndirect() which does what it says and PdfWriter.AddDirectImageSimple(iTextSharp.text.Image, PRIndirectReference) which you can then use to replace what you killed off.

In pseudo C# code you’d do:

var oldImage = PdfReader.GetPdfObject();
var newImage = YourImageCompressionFunction(oldImage);
PdfReader.KillIndirect(oldImage);
yourPdfWriter.AddDirectImageSimple(newImage, (PRIndirectReference)oldImage);

Converting the raw bytes to a .Net image can be tricky, I’ll leave that up to you or you can search here. Mark has a good description here. Also, technically PDFs don’t have a concept of DPI, that’s for printers mostly. See the answer here for more on that.

Using the method above your compression algorithm can actually do two things, physically shrink the image as well as apply JPEG compression. When you physically shrink the image and add it back it will occupy the same amount of space as the original image but with less pixels to work with. This will get you what you consider to be DPI reduction. The JPEG compression speaks for itself.

Below is a full working C# 2010 WinForms app targeting iTextSharp 5.1.1.0. It takes an existing JPEG on your desktop called “LargeImage.jpg” and creates a new PDF from it. Then it opens the PDF, extracts the image, physically shrinks it to 90% of the original size, applies 85% JPEG compression and writes it back to the PDF. See the comments in the code for more of an explanation. The code needs lots more null/error checking. Also looks for NOTE comments where you’ll need to expand to handle other situations.

using System;
using System.Drawing;
using System.Drawing.Imaging;
using System.Drawing.Drawing2D;
using System.Windows.Forms;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;

namespace WindowsFormsApplication1 {
    public partial class Form1 : Form {
        public Form1() {
            InitializeComponent();
        }

        private void Form1_Load(object sender, EventArgs e) {
            //Our working folder
            string workingFolder = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
            //Large image to add to sample PDF
            string largeImage = Path.Combine(workingFolder, "LargeImage.jpg");
            //Name of large PDF to create
            string largePDF = Path.Combine(workingFolder, "Large.pdf");
            //Name of compressed PDF to create
            string smallPDF = Path.Combine(workingFolder, "Small.pdf");

            //Create a sample PDF containing our large image, for demo purposes only, nothing special here
            using (FileStream fs = new FileStream(largePDF, FileMode.Create, FileAccess.Write, FileShare.None)) {
                using (Document doc = new Document()) {
                    using (PdfWriter writer = PdfWriter.GetInstance(doc, fs)) {
                        doc.Open();

                        iTextSharp.text.Image importImage = iTextSharp.text.Image.GetInstance(largeImage);
                        doc.SetPageSize(new iTextSharp.text.Rectangle(0, 0, importImage.Width, importImage.Height));
                        doc.SetMargins(0, 0, 0, 0);
                        doc.NewPage();
                        doc.Add(importImage);

                        doc.Close();
                    }
                }
            }

            //Now we're going to open the above PDF and compress things

            //Bind a reader to our large PDF
            PdfReader reader = new PdfReader(largePDF);
            //Create our output PDF
            using (FileStream fs = new FileStream(smallPDF, FileMode.Create, FileAccess.Write, FileShare.None)) {
                //Bind a stamper to the file and our reader
                using (PdfStamper stamper = new PdfStamper(reader, fs)) {
                    //NOTE: This code only deals with page 1, you'd want to loop more for your code
                    //Get page 1
                    PdfDictionary page = reader.GetPageN(1);
                    //Get the xobject structure
                    PdfDictionary resources = (PdfDictionary)PdfReader.GetPdfObject(page.Get(PdfName.RESOURCES));
                    PdfDictionary xobject = (PdfDictionary)PdfReader.GetPdfObject(resources.Get(PdfName.XOBJECT));
                    if (xobject != null) {
                        PdfObject obj;
                        //Loop through each key
                        foreach (PdfName name in xobject.Keys) {
                            obj = xobject.Get(name);
                            if (obj.IsIndirect()) {
                                //Get the current key as a PDF object
                                PdfDictionary imgObject = (PdfDictionary)PdfReader.GetPdfObject(obj);
                                //See if its an image
                                if (imgObject.Get(PdfName.SUBTYPE).Equals(PdfName.IMAGE)) {
                                    //NOTE: There's a bunch of different types of filters, I'm only handing the simplest one here which is basically raw JPG, you'll have to research others
                                    if (imgObject.Get(PdfName.FILTER).Equals(PdfName.DCTDECODE)) {
                                        //Get the raw bytes of the current image
                                        byte[] oldBytes = PdfReader.GetStreamBytesRaw((PRStream)imgObject);
                                        //Will hold bytes of the compressed image later
                                        byte[] newBytes;
                                        //Wrap a stream around our original image
                                        using (MemoryStream sourceMS = new MemoryStream(oldBytes)) {
                                            //Convert the bytes into a .Net image
                                            using (System.Drawing.Image oldImage = Bitmap.FromStream(sourceMS)) {
                                                //Shrink the image to 90% of the original
                                                using (System.Drawing.Image newImage = ShrinkImage(oldImage, 0.9f)) {
                                                    //Convert the image to bytes using JPG at 85%
                                                    newBytes = ConvertImageToBytes(newImage, 85);
                                                }
                                            }
                                        }
                                        //Create a new iTextSharp image from our bytes
                                        iTextSharp.text.Image compressedImage = iTextSharp.text.Image.GetInstance(newBytes);
                                        //Kill off the old image
                                        PdfReader.KillIndirect(obj);
                                        //Add our image in its place
                                        stamper.Writer.AddDirectImageSimple(compressedImage, (PRIndirectReference)obj);
                                    }
                                }
                            }
                        }
                    }
                }
            }

            this.Close();
        }

        //Standard image save code from MSDN, returns a byte array
        private static byte[] ConvertImageToBytes(System.Drawing.Image image, long compressionLevel) {
            if (compressionLevel < 0) {
                compressionLevel = 0;
            } else if (compressionLevel > 100) {
                compressionLevel = 100;
            }
            ImageCodecInfo jgpEncoder = GetEncoder(ImageFormat.Jpeg);

            System.Drawing.Imaging.Encoder myEncoder = System.Drawing.Imaging.Encoder.Quality;
            EncoderParameters myEncoderParameters = new EncoderParameters(1);
            EncoderParameter myEncoderParameter = new EncoderParameter(myEncoder, compressionLevel);
            myEncoderParameters.Param[0] = myEncoderParameter;
            using (MemoryStream ms = new MemoryStream()) {
                image.Save(ms, jgpEncoder, myEncoderParameters);
                return ms.ToArray();
            }

        }
        //standard code from MSDN
        private static ImageCodecInfo GetEncoder(ImageFormat format) {
            ImageCodecInfo[] codecs = ImageCodecInfo.GetImageDecoders();
            foreach (ImageCodecInfo codec in codecs) {
                if (codec.FormatID == format.Guid) {
                    return codec;
                }
            }
            return null;
        }
        //Standard high quality thumbnail generation from http://weblogs.asp.net/gunnarpeipman/archive/2009/04/02/resizing-images-without-loss-of-quality.aspx
        private static System.Drawing.Image ShrinkImage(System.Drawing.Image sourceImage, float scaleFactor) {
            int newWidth = Convert.ToInt32(sourceImage.Width * scaleFactor);
            int newHeight = Convert.ToInt32(sourceImage.Height * scaleFactor);

            var thumbnailBitmap = new Bitmap(newWidth, newHeight);
            using (Graphics g = Graphics.FromImage(thumbnailBitmap)) {
                g.CompositingQuality = CompositingQuality.HighQuality;
                g.SmoothingMode = SmoothingMode.HighQuality;
                g.InterpolationMode = InterpolationMode.HighQualityBicubic;
                System.Drawing.Rectangle imageRectangle = new System.Drawing.Rectangle(0, 0, newWidth, newHeight);
                g.DrawImage(sourceImage, imageRectangle);
            }
            return thumbnailBitmap;
        }
    }
}

Leave a Comment