PDF Parser
The PDF parser tool extracts text and tables from PDF files accessible via URL. It returns clean, structured content that agents can analyse, summarise, or extract data from.
Tool: parse_pdf
Downloads a PDF from a URL and extracts its content as plain text and Markdown tables.
Arguments
| Argument | Type | Description |
|---|---|---|
url | string | URL of the PDF to parse |
pages | string | Optional — page range to extract (e.g. "1-5", "3", "10-20") |
extract_tables | boolean | Convert tables to Markdown format (default: true) |
Output
Returns the extracted text with:
- Paragraphs — plain text, page breaks noted
- Tables — converted to Markdown table format
- Page markers —
--- Page N ---separators
Use cases
Parse the Q3 earnings report at https://example.com/reports/q3-2024.pdf and extract all financial figures.
Read pages 5-15 of the technical specification at [URL] and summarise the API changes.
Extract all tables from the data sheet at [URL] and identify the column headers.
Local files
To parse a local PDF, first save it to the vault, then reference the vault path:
Read the PDF at vault/documents/contract.pdf and summarise the payment terms.
The agent uses vault_read to get the file, then parse_pdf on the local path.
Limitations
- Very large PDFs (100+ pages) may exceed context limits — use the
pagesparameter to extract specific sections - Scanned PDFs without OCR text layer will return minimal content
- Password-protected PDFs cannot be parsed