Opening pdf urls with pyPdf

I think urllib2 will get you what you want. from urllib2 import Request, urlopen from pyPdf import PdfFileWriter, PdfFileReader from StringIO import StringIO url = “http://www.silicontao.com/ProgrammingGuide/other/beejnet.pdf” writer = PdfFileWriter() remoteFile = urlopen(Request(url)).read() memoryFile = StringIO(remoteFile) pdfFile = PdfFileReader(memoryFile) for pageNum in xrange(pdfFile.getNumPages()): currentPage = pdfFile.getPage(pageNum) #currentPage.mergePage(watermark.getPage(0)) writer.addPage(currentPage) outputStream = open(“output.pdf”,”wb”) writer.write(outputStream) outputStream.close()

Cropping pages of a .pdf file

pyPdf does what I expect in this area. Using the following script: #!/usr/bin/python # from pyPdf import PdfFileWriter, PdfFileReader with open(“in.pdf”, “rb”) as in_f: input1 = PdfFileReader(in_f) output = PdfFileWriter() numPages = input1.getNumPages() print “document has %s pages.” % numPages for i in range(numPages): page = input1.getPage(i) print page.mediaBox.getUpperRight_x(), page.mediaBox.getUpperRight_y() page.trimBox.lowerLeft = (25, 25) page.trimBox.upperRight … Read more

PDF bleed detection

Quoting from the PDF specification ISO 32000-1:2008 as published by Adobe: 14.11.2 Page Boundaries 14.11.2.1 General A PDF page may be prepared either for a finished medium, such as a sheet of paper, or as part of a prepress process in which the content of the page is placed on an intermediate medium, such as … Read more

Merge PDF files

You can use PyPdf2s PdfMerger class. File Concatenation You can simply concatenate files by using the append method. from PyPDF2 import PdfMerger pdfs = [‘file1.pdf’, ‘file2.pdf’, ‘file3.pdf’, ‘file4.pdf’] merger = PdfMerger() for pdf in pdfs: merger.append(pdf) merger.write(“result.pdf”) merger.close() You can pass file handles instead file paths if you want. File Merging If you want more … Read more