How to extract text from a PDF? [closed]

I was given a 400 page pdf file with a table of data that I had to import – luckily no images. Ghostscript worked for me:

gswin64c -sDEVICE=txtwrite -o output.txt input.pdf

The output file was split into pages with headers, etc., but it was then easy to write an app to strip out blank lines, etc, and suck in all 30,000 records. -dSIMPLE and -dCOMPLEX made no difference in this case.

More Related Contents:

PDF text extraction from given coordinates
Cropping a PDF using Ghostscript 9.01
How can I extract embedded fonts from a PDF as valid font files?
How can I remove all images from a PDF?
Using GhostScript to get page size
Get pdf-attachments from Gmail as text
How to use ghostscript to convert PDF to PDF/A or PDF/X?
Script (or some other means) to convert RGB to CMYK in PDF?
PDF – Remove White Margins
How can I shift page images in PDF files more to the left or to the right?
Is it possible in Ghostscript to add watermark to every page in PDF
Removing Watermark from PDF iTextSharp
How to extract text from PDF in JavaSript
PDF Blob is not showing content, Angular 2
Unable to copy exact hindi content from pdf
iTextSharp-generated PDFs now cause Save dialog in Adobe Reader X
How to download single sheet as PDF to my local device directly (not export to Google Drive)?
Convert PDF to image with high resolution
Converting a PDF to PNG
Searching text in a PDF using Python?
Best way to convert pdf files to tiff files [closed]
Arabic characters from html content to pdf using iText
Converting PDF to CMYK (with identify recognizing CMYK)
How to implement custom fonts in TCPDF
Can no longer produce PDF from Google Sheets spreadsheet for some of the users
How to determine artificial bold style ,artificial italic style and artificial outline style of a text using PDFBOX
How to enable LTV for a timestamp signature?
How can I flatten a XFA PDF Form using iTextSharp?
What does “Not LTV-enabled” mean?
How to install wkhtmltopdf on a linux based (shared hosting) web server

More Related Contents:

Leave a Comment Cancel reply