pdf - w3toppers.com

How to install wkhtmltopdf on a linux based (shared hosting) web server

I’ve managed to successfully install wkhtmltopdf-amd64 on my shared hosting account without root access. Here’s what i did: Downloaded the relevant static binary v0.10.0 from here: http://code.google.com/p/wkhtmltopdf/downloads/list EDIT: The above has moved to here via ssh on my shared host typed the following: $ wget {relavant url to binary from link above} $ tar -xvf … Read more

Is it possible in Ghostscript to add watermark to every page in PDF

Bit too big for a comment, so I’ve added a new answer. The EndPage procedure (see page 441 of the PostScript Language Reference Manual) takes two additional parameters on the stack, a count of pages emitted so far, and a reason code. You can use the count of pages to do interesting things like duplexing, … Read more

How can I shift page images in PDF files more to the left or to the right?

If you don’t want to write your own program code (as Nikolaus suggested), but use a Ghostscript commandline instead, you need to know 3 things: PostScript has a setpagedevice operator that takes a PageOffset parameter; Ghostscript will process snippets of PostScript code if you pass them with -c … on the commandline; Ghostscript can evaluate … Read more

What does “Not LTV-enabled” mean?

LTV (Long Term Validation) and PDF signatures The term LTV-enabled 4 Profile for PAdES-LTV 4.1 Overview Validation of an electronic signature requires data to validate the signature such as CA certificates, Certificate Revocation List (CRLs) or Certificate status information (OCSP) commonly provided by an online service (referred to in the present document as validation data). … Read more

PDF Spec vs Acrobat creation (QuadPoints)

I’ve written a PDF annotation lib for iOS and found the same against-the-spec Acrobat behavior. As a bit of further info, the Text Markup annotation also contains an Rect entry as well as the QuadPoints entry. The Rect entry is per the spec, [llx, lly, urx, ury]. So in Acrobat generated Text Markup annotations, the … Read more

How to index a pdf file in Elasticsearch 5.0.0 with ingest-attachment plugin?

You need to make sure you have created your ingest pipeline with: PUT _ingest/pipeline/attachment { “description” : “Extract attachment information”, “processors” : [ { “attachment” : { “field” : “data”, “indexed_chars” : -1 } } ] } Then you can make a PUT not POST to your index using the pipeline you’ve created. PUT my_index/my_type/my_id?pipeline=attachment … Read more

How to extract Highlighted Parts from PDF files

To extract highlighted parts, you can use PyMuPDF. Here is an example which works with this pdf file: Direct download # Based on https://stackoverflow.com/a/62859169/562769 from typing import List, Tuple import fitz # install with ‘pip install pymupdf’ def _parse_highlight(annot: fitz.Annot, wordlist: List[Tuple[float, float, float, float, str, int, int, int]]) -> str: points = annot.vertices quad_count … Read more