Using PDFBox to write UTF-8 encoded strings to a PDF [duplicate]

You are using one of the inbuilt ‘Base 14’ fonts that are supplied with Adobe Reader. These fonts are not Unicode; they are effectively a standard Latin alphabet, though with a couple of extra characters. It looks like the character you mention, a lowercase s with a caron (š), is not available in PDF Latin text… though an uppercase Š is available but curiously on Windows only. See Appendix D of the PDF specification at http://www.adobe.com/devnet/pdf/pdf_reference.html for details.

Anyway, getting to the point… you need to embed a Unicode font if you want to use Unicode characters. Make sure you are licensed to embed whatever font you decide on… I can recommend the open-source Gentium or Doulos fonts because they’re free, high quality and have comprehensive Unicode support.

Leave a Comment