Open source PDF viewer that includes PDF information extractor and font analyzer

Xpdf is an open source PDF viewer that includes command-line information extractor and font analyzer utilities. The following utilities are particularly relevant to digital preservation:

  • pdfinfo: prints the contents of the ‘Info’ dictionary (plus some other useful information) from a Portable Document Format (PDF) file. In addition, the following information is printed:

tagged (yes/no)

form (AcroForm / XFA / none)

page count

encrypted flag (yes/no)

print and copy permissions (if encrypted)

page size

file size

linearized (yes/no)

PDF version

metadata (only if requested)

  • pdffonts: lists the fonts used in a Portable Document Format (PDF) file along with various information for each font

The following information is listed for each font:

name the font name, exactly as given in the PDF file (potentially

including a subset prefix)

type the font type

emb “yes” if the font is embedded in the PDF file

sub “yes” if the font is a subset

uni “yes” if there is an explicit “ToUnicode” map in the PDF file

(the absence of a ToUnicode map doesn’t necessarily mean that

the text can’t be converted to Unicode)

