How to extract text from a PDF? [closed]

I was given a 400 page pdf file with a table of data that I had to import – luckily no images. Ghostscript worked for me:

gswin64c -sDEVICE=txtwrite -o output.txt input.pdf

The output file was split into pages with headers, etc., but it was then easy to write an app to strip out blank lines, etc, and suck in all 30,000 records. -dSIMPLE and -dCOMPLEX made no difference in this case.

Leave a Comment