PDF Parser

The PDF parser tool extracts text and tables from PDF files accessible via URL. It returns clean, structured content that agents can analyse, summarise, or extract data from.

Tool: `parse_pdf`

Downloads a PDF from a URL and extracts its content as plain text and Markdown tables.

Arguments

Argument	Type	Description
`url`	string	URL of the PDF to parse
`pages`	string	Optional — page range to extract (e.g. `"1-5"`, `"3"`, `"10-20"`)
`extract_tables`	boolean	Convert tables to Markdown format (default: true)

Output

Returns the extracted text with:

Paragraphs — plain text, page breaks noted
Tables — converted to Markdown table format
Page markers — --- Page N --- separators

Use cases

Parse the Q3 earnings report at https://example.com/reports/q3-2024.pdf and extract all financial figures.

Read pages 5-15 of the technical specification at [URL] and summarise the API changes.

Extract all tables from the data sheet at [URL] and identify the column headers.

Local files

To parse a local PDF, first save it to the vault, then reference the vault path:

Read the PDF at vault/documents/contract.pdf and summarise the payment terms.

The agent uses vault_read to get the file, then parse_pdf on the local path.

Limitations

Very large PDFs (100+ pages) may exceed context limits — use the pages parameter to extract specific sections
Scanned PDFs without OCR text layer will return minimal content
Password-protected PDFs cannot be parsed

Tool: parse_pdf​

Arguments​

Output​

Use cases​

Local files​

Limitations​