Parsing converts documents into clean, structured, LLM-ready content. It turns PDFs, images, spreadsheets, presentations, and scanned files into layout-aware markdown, split into chunks, alongside a block-level breakdown (text, tables, figures, and key-value pairs) with spatial metadata. Use it as the foundation for RAG pipelines, custom ingestion workflows, downstream extraction, and agents.
We’ll parse a sample bank statement. For this quick-start we’ve uploaded the file here.
Grab a key from the Developers page and store it as the EXTEND_API_KEY environment variable. If you’re using an SDK, see the installation instructions.
Want to parse your own document? Upload it first, then pass the returned file id instead of a url.
After you run the code snippet above, you’ll see a response like this. This example response is truncated for brevity. The response is organized into output.chunks, which in this case are page-level units. Each chunk includes a formatted content string for the full page and a blocks array for block-level elements (like text, tables, and figures) with metadata and spatial data.
For full request/response details, see the Create Parse Run API reference.
You can pass each chunk’s formatted content straight into an LLM, or walk individual blocks for more control over tables and layout.
For a deeper guide on how to use the output of this endpoint, see Response Format.
The example above calls the synchronous /parse endpoint. We also have an asynchronous /parse_runs endpoint that should be used for large files and high volume use cases.
See Async Processing for the full comparison, polling options, and webhook setup.
The quick start uses default settings. To control how a document is parsed, pass a config object alongside file. Here are the most commonly changed options; for the full reference, see Configuration.
Choose the parsing engine based on your accuracy and latency needs.
By default, Parse returns one chunk per page. For RAG, section chunking splits at semantic boundaries so each chunk is a complete unit you can embed and retrieve independently.
Controls how tables appear in each block’s content.
Figures are parsed by default. Disable them for the fastest parsing, or enable advanced chart extraction to convert charts into structured tables.
Agentic processing uses a vision model to review and correct parsing output. It’s off by default and adds latency, so enable it only where it helps:
text.agentic — corrects low-confidence OCR. Enable for handwriting, faded or skewed scans, or when you see garbled characters in the output.tables.agentic — reviews and fixes table structure. Enable for tables with misaligned columns, merged cells, or values landing in the wrong column.Don’t enable either for clean digital PDFs; they parse correctly without it and you’ll just add latency.
Process only specific pages. Page numbers are 1-based and inclusive, and you’re only billed for pages actually processed.
For every option, including the full block options, Excel settings, and OCR output, see the Configuration reference.