| .\" Copyright 1997-2011 Glyph & Cog, LLC |
| .TH pdftotext 1 "15 August 2011" |
| .SH NAME |
| pdftotext \- Portable Document Format (PDF) to text converter |
| (version 3.03) |
| .SH SYNOPSIS |
| .B pdftotext |
| [options] |
| .RI PDF-file |
| .RI [ text-file ] |
| .SH DESCRIPTION |
| .B Pdftotext |
| converts Portable Document Format (PDF) files to plain text. |
| .PP |
| Pdftotext reads the PDF file, |
| .IR PDF-file , |
| and writes a text file, |
| .IR text-file . |
| If |
| .I text-file |
| is not specified, pdftotext converts |
| .I file.pdf |
| to |
| .IR file.txt . |
| If |
| .I text-file |
| is \'-', the text is sent to stdout. If |
| .I PDF-file |
| is \'-', it reads the PDF file from stdin. |
| .SH OPTIONS |
| .TP |
| .BI \-f " number" |
| Specifies the first page to convert. |
| .TP |
| .BI \-l " number" |
| Specifies the last page to convert. |
| .TP |
| .BI \-r " number" |
| Specifies the resolution, in DPI. The default is 72 DPI. |
| .TP |
| .BI \-x " number" |
| Specifies the x-coordinate of the crop area top left corner |
| .TP |
| .BI \-y " number" |
| Specifies the y-coordinate of the crop area top left corner |
| .TP |
| .BI \-W " number" |
| Specifies the width of crop area in pixels (default is 0) |
| .TP |
| .BI \-H " number" |
| Specifies the height of crop area in pixels (default is 0) |
| .TP |
| .B \-layout |
| Maintain (as best as possible) the original physical layout of the |
| text. The default is to \'undo' physical layout (columns, |
| hyphenation, etc.) and output the text in reading order. |
| .TP |
| .BI \-fixed " number" |
| Assume fixed-pitch (or tabular) text, with the specified character |
| width (in points). This forces physical layout mode. |
| .TP |
| .B \-raw |
| Keep the text in content stream order. This is a hack which often |
| "undoes" column formatting, etc. Use of raw mode is no longer |
| recommended. |
| .TP |
| .B \-nodiag |
| Discard diagonal text (i.e., text that is not close to one of the |
| 0, 90, 180, or 270 degree axes). This is useful for skipping |
| watermarks drawn on body text. |
| .TP |
| .B \-htmlmeta |
| Generate a simple HTML file, including the meta information. This |
| simply wraps the text in <pre> and </pre> and prepends the meta |
| headers. |
| .TP |
| .B \-bbox |
| Generate an XHTML file containing bounding box information for each |
| word in the file. |
| .TP |
| .B \-bbox-layout |
| Generate an XHTML file containing bounding box information for each |
| block, line, and word in the file. |
| .TP |
| .B \-tsv |
| Generate a TSV file containing the bounding box information for each |
| block, line, and word in the file. |
| .TP |
| .B \-cropbox |
| Use the crop box rather than the media box with \-bbox and \-bbox-layout. |
| .TP |
| .BI \-colspacing " number" |
| Specifies how much spacing we allow after a word before considering adjacent text to be a new column, measured as a fraction of the font size. Current default is 0.7, old releases had a 0.3 default. |
| .TP |
| .BI \-enc " encoding-name" |
| Sets the encoding to use for text output. This defaults to "UTF-8". |
| .TP |
| .B \-listenc |
| Lists the available encodings |
| .TP |
| .BI \-eol " unix | dos | mac" |
| Sets the end-of-line convention to use for text output. |
| .TP |
| .B \-nopgbrk |
| Don't insert page breaks (form feed characters) between pages. |
| .TP |
| .BI \-opw " password" |
| Specify the owner password for the PDF file. Providing this will |
| bypass all security restrictions. |
| .TP |
| .BI \-upw " password" |
| Specify the user password for the PDF file. |
| .TP |
| .B \-q |
| Don't print any messages or errors. |
| .TP |
| .B \-v |
| Print copyright and version information. |
| .TP |
| .B \-h |
| Print usage information. |
| .RB ( \-help |
| and |
| .B \-\-help |
| are equivalent.) |
| .SH BUGS |
| Some PDF files contain fonts whose encodings have been mangled beyond |
| recognition. There is no way (short of OCR) to extract text from |
| these files. |
| .SH EXIT CODES |
| The Xpdf tools use the following exit codes: |
| .TP |
| 0 |
| No error. |
| .TP |
| 1 |
| Error opening a PDF file. |
| .TP |
| 2 |
| Error opening an output file. |
| .TP |
| 3 |
| Error related to PDF permissions. |
| .TP |
| 99 |
| Other error. |
| .SH AUTHOR |
| The pdftotext software and documentation are copyright 1996-2011 Glyph |
| & Cog, LLC. |
| .SH "SEE ALSO" |
| .BR pdfdetach (1), |
| .BR pdffonts (1), |
| .BR pdfimages (1), |
| .BR pdfinfo (1), |
| .BR pdftocairo (1), |
| .BR pdftohtml (1), |
| .BR pdftoppm (1), |
| .BR pdftops (1), |
| .BR pdfseparate (1), |
| .BR pdfsig (1), |
| .BR pdfunite (1) |