How to install wkhtmltopdf on a linux based (shared hosting) web server

I’ve managed to successfully install wkhtmltopdf-amd64 on my shared hosting account without root access. Here’s what i did: Downloaded the relevant static binary v0.10.0 from here: http://code.google.com/p/wkhtmltopdf/downloads/list EDIT: The above has moved to here via ssh on my shared host typed the following: $ wget {relavant url to binary from link above} $ tar -xvf … Read more

What does “Not LTV-enabled” mean?

LTV (Long Term Validation) and PDF signatures The term LTV-enabled 4 Profile for PAdES-LTV 4.1 Overview Validation of an electronic signature requires data to validate the signature such as CA certificates, Certificate Revocation List (CRLs) or Certificate status information (OCSP) commonly provided by an online service (referred to in the present document as validation data). … Read more

How to index a pdf file in Elasticsearch 5.0.0 with ingest-attachment plugin?

You need to make sure you have created your ingest pipeline with: PUT _ingest/pipeline/attachment { “description” : “Extract attachment information”, “processors” : [ { “attachment” : { “field” : “data”, “indexed_chars” : -1 } } ] } Then you can make a PUT not POST to your index using the pipeline you’ve created. PUT my_index/my_type/my_id?pipeline=attachment … Read more

How to extract Highlighted Parts from PDF files

To extract highlighted parts, you can use PyMuPDF. Here is an example which works with this pdf file: Direct download # Based on https://stackoverflow.com/a/62859169/562769 from typing import List, Tuple import fitz # install with ‘pip install pymupdf’ def _parse_highlight(annot: fitz.Annot, wordlist: List[Tuple[float, float, float, float, str, int, int, int]]) -> str: points = annot.vertices quad_count … Read more