Merging multiple PDFs using iTextSharp in c#.net

I found the answer: Instead of the 2nd Method, add more files to the first array of input files. public static void CombineMultiplePDFs(string[] fileNames, string outFile) { // step 1: creation of a document-object Document document = new Document(); //create newFileStream object which will be disposed at the end using (FileStream newFileStream = new FileStream(outFile, … Read more

Extract images using iTextSharp

I found that my problem was that I was not recursively searching inside of forms and groups for images. Basically, the original code would only find images that were embedded at the root of the pdf document. Here is the revised method plus a new method (FindImageInPDFDictionary) that recursively searches for images in the page. … Read more

How To Remove Whitespace on Merge

The following sample tool has been implemented along the ideas of the tool PdfDenseMergeTool from this answer which the OP has commented to be SO close to what [he] NEEDs. Just like PdfDenseMergeTool this tool here is implemented in Java/iText which I’m more at home with than C#/iTextSharp. As the OP has already translated PdfDenseMergeTool … Read more

Reading PDF content with itextsharp dll in VB.NET or C#

using iTextSharp.text.pdf; using iTextSharp.text.pdf.parser; using System.IO; public string ReadPdfFile(string fileName) { StringBuilder text = new StringBuilder(); if (File.Exists(fileName)) { PdfReader pdfReader = new PdfReader(fileName); for (int page = 1; page <= pdfReader.NumberOfPages; page++) { ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy(); string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy); currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText))); text.Append(currentText); } pdfReader.Close(); } return … Read more

Reading pdf content using iTextSharp in C#

In .Net, once you have a string, you have a string, and it is Unicode, always. The actual in-memory implementation is UTF-16 but that doesn’t matter. Never, ever, ever decompose the string into bytes and try to reinterpret it as a different encoding and slap it back as a string because that doesn’t make sense … Read more

how can i get text formatting with iTextSharp

Let me try pointing you in a different direction. iTextSharp has a really beautiful and simple text extraction system that handle some of the basic tokens. Unfortunately it doesn’t handle color information but according to @Mark Storer it might not be too hard to implement yourself. BEGIN EDIT I started work on implementing color information. … Read more