PythonMagick can’t find my pdf files

I had exactly the same problem couple of days ago. While converting from .gif (oder something else) to .jpg worked really fine, converting from .pdf to .jpg produced exactly the same error. Thats happing because ImageMagick uses Ghostscript for reading/converting PDFs. You can solve the problem by installing Ghostscript (only 32-bit version works). Don’t forget … Read more

Annotate PDF within iPhone SDK

You can do annotation by reading in a PDF page, drawing it onto a new PDF graphics context, then drawing extra content onto that graphic context. Here is some code that adds the words ‘Example annotation’ at position (100.0,100.0) to an existing PDF. The method getPDFFileName returns the path of the original PD. getTempPDFFileName returns … Read more

Extract Image from PDF using Java

You can use Pdfbox List pages = document.getDocumentCatalog().getAllPages(); Iterator iter = pages.iterator(); while( iter.hasNext() ) { PDPage page = (PDPage)iter.next(); PDResources resources = page.getResources(); Map images = resources.getImages(); if( images != null ) { Iterator imageIter = images.keySet().iterator(); while( imageIter.hasNext() ) { String key = (String)imageIter.next(); PDXObjectImage image = (PDXObjectImage)images.get( key ); String name = … Read more

Opening pdf urls with pyPdf

I think urllib2 will get you what you want. from urllib2 import Request, urlopen from pyPdf import PdfFileWriter, PdfFileReader from StringIO import StringIO url = “http://www.silicontao.com/ProgrammingGuide/other/beejnet.pdf” writer = PdfFileWriter() remoteFile = urlopen(Request(url)).read() memoryFile = StringIO(remoteFile) pdfFile = PdfFileReader(memoryFile) for pageNum in xrange(pdfFile.getNumPages()): currentPage = pdfFile.getPage(pageNum) #currentPage.mergePage(watermark.getPage(0)) writer.addPage(currentPage) outputStream = open(“output.pdf”,”wb”) writer.write(outputStream) outputStream.close()

Reading PDF file using javascript

I know that the question is old, but if you find PDF.js too complex for the job, npm install pdfreader. (I wrote that module) It would take 5 lines of code to extract text from your PDF file: var PdfReader = require(“pdfreader”).PdfReader; new PdfReader().parseFileItems(“sample.pdf”, function(err, item){ if (item && item.text) console.log(item.text); });