itextsharp - w3toppers.com

How to merge multiple pdf files (generated in run time)?

If you want to merge source documents using iText(Sharp), there are two basic situations: You really want to merge the documents, acquiring the pages in their original format, transfering as much of their content and their interactive annotations as possible. In this case you should use a solution based on a member of the Pdf*Copy* … Read more

Merging multiple PDFs using iTextSharp in c#.net

I found the answer: Instead of the 2nd Method, add more files to the first array of input files. public static void CombineMultiplePDFs(string[] fileNames, string outFile) { // step 1: creation of a document-object Document document = new Document(); //create newFileStream object which will be disposed at the end using (FileStream newFileStream = new FileStream(outFile, … Read more

How to update a PDF without creating a new PDF?

As documented in my book iText in Action, you can’t read a file and write to it simultaneously. Think of how Word works: you can’t open a Word document and write directly to it. Word always creates a temporary file, writes the changes to it, then replaces the original file with it and then throws … Read more

Extract images using iTextSharp

I found that my problem was that I was not recursively searching inside of forms and groups for images. Basically, the original code would only find images that were embedded at the root of the pdf document. Here is the revised method plus a new method (FindImageInPDFDictionary) that recursively searches for images in the page. … Read more

How To Remove Whitespace on Merge

The following sample tool has been implemented along the ideas of the tool PdfDenseMergeTool from this answer which the OP has commented to be SO close to what [he] NEEDs. Just like PdfDenseMergeTool this tool here is implemented in Java/iText which I’m more at home with than C#/iTextSharp. As the OP has already translated PdfDenseMergeTool … Read more

Reading PDF content with itextsharp dll in VB.NET or C#

using iTextSharp.text.pdf; using iTextSharp.text.pdf.parser; using System.IO; public string ReadPdfFile(string fileName) { StringBuilder text = new StringBuilder(); if (File.Exists(fileName)) { PdfReader pdfReader = new PdfReader(fileName); for (int page = 1; page <= pdfReader.NumberOfPages; page++) { ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy(); string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy); currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText))); text.Append(currentText); } pdfReader.Close(); } return … Read more

Reading pdf content using iTextSharp in C#

In .Net, once you have a string, you have a string, and it is Unicode, always. The actual in-memory implementation is UTF-16 but that doesn’t matter. Never, ever, ever decompose the string into bytes and try to reinterpret it as a different encoding and slap it back as a string because that doesn’t make sense … Read more

how can i get text formatting with iTextSharp

Let me try pointing you in a different direction. iTextSharp has a really beautiful and simple text extraction system that handle some of the basic tokens. Unfortunately it doesn’t handle color information but according to @Mark Storer it might not be too hard to implement yourself. BEGIN EDIT I started work on implementing color information. … Read more

Can’t get Czech characters while generating a PDF

THE PROBLEM: First of all, you don’t seem to be talking about Cyrillic characters, but about central and eastern European languages that use Latin script. Take a look at the difference between code page 1250 and code page 1251 to understand what I mean. [NOTE: I have updated the question so that it talks about … Read more

How to convert HTML to PDF using iTextSharp

First, HTML and PDF are not related although they were created around the same time. HTML is intended to convey higher level information such as paragraphs and tables. Although there are methods to control it, it is ultimately up to the browser to draw these higher level concepts. PDF is intended to convey documents and … Read more