Apache PDFBox: problems with encoding

This answer is actually an explanation why a generic solution for your task is at least very complicated if not impossible. Under benign circumstances, i.e. for PDFs subject to specific restrictions, code like yours can be successfully used, but your example PDF shows that the PDFs you apparently want to manipulate are not restricted like … Read more

pdfbox 2.0.2 > Calling of PageDrawer.processPage method caught exceptions

Extending PageDrawer didn’t really work, so I extended PDFGraphicsStreamEngine and here’s the result. I do some of the stuff that is done in PageDrawer. To collect lines, either evaluate the shape in strokePath(), or collect points and lines in the other methods where I have included a println. public class LineCatcher extends PDFGraphicsStreamEngine { private … Read more

Convert PDF files to images with PDFBox

Solution for 1.8.* versions: PDDocument document = PDDocument.loadNonSeq(new File(pdfFilename), null); List<PDPage> pdPages = document.getDocumentCatalog().getAllPages(); int page = 0; for (PDPage pdPage : pdPages) { ++page; BufferedImage bim = pdPage.convertToImage(BufferedImage.TYPE_INT_RGB, 300); ImageIOUtil.writeImage(bim, pdfFilename + “-” + page + “.png”, 300); } document.close(); Don’t forget to read the 1.8 dependencies page before doing your build. Solution for … Read more