Quick Start | extend

The Parse endpoint converts documents into structured, LLM-ready formats. Use it to extract clean document content for downstream processing such as RAG pipelines, custom ingestion workflows, document extraction tasks, Agents, etc.

This quickstart will get you up and running with the Parse API in under 5 minutes to extract structured content that can be passed into your LLM/Agent as context or further processed.

Using Workflows? Parser settings can also be configured directly on the Parse Step in your workflow. This allows you to set explicit parsing behavior for all documents processed by that workflow.

What we’re going to parse

We’ll use a bank statement to demonstrate the Parse API.

Feel free to use this document to follow along with the quickstart!

Using the Parse API

Choose your preferred language to get started. If you’re using an SDK, see installation instructions. For raw REST calls, you can use the built-in fetch API or a library like requests for Python.

Replace <YOUR_API_KEY> with your actual key, available on the Developers page, then run one of the examples below to parse the document.

1 import { ExtendClient } from "extend-ai";
2 
3 const client = new ExtendClient({
4   token: "<YOUR_API_KEY>",
5 });
6 
7 // client.parse() is for testing, we recommend using webhooks or polling with client.parseRuns.createAndPoll() in production
8 const response = await client.parse({
9   file: {
10     name: "bank_statement.pdf",
11     url: "https://extend-public-files.s3.us-east-2.amazonaws.com/bank_statement_example.pdf",
12   }
13 });
14 
15 console.log("Parsed content:", response);

Each example sends a document to the Parse API and returns structured content split into page-level chunks.

Note: this doesn’t set any configuration options, so the parser will use the default settings. For configuration details, see Configuration Options.

For large documents with long processing times, consider using webhooks to avoid unnecessary polling requests. See Webhooks for setup instructions.

Example response (truncated)

After you run the code snippet above, you’ll see a response like this. This example response is truncated for brevity. The response is organized into output.chunks, which in this case are page-level units. Each chunk includes a formatted content string for the full page and a blocks array for block-level elements (like text, tables, and figures) with metadata and spatial data.

1 {
2   "object": "parse_run",
3   "id": "pr_3f1j6I1gsw5k96xFiCnkM",
4   "file": {
5     "object": "file",
6     "id": "file_GzKUy0VDhHscv7tweODYb",
7     "name": "bank_statement.pdf"
8   },
9   "status": "PROCESSED",
10   "failureReason": null,
11   "failureMessage": null,
12   "output": {
13     "chunks": [
14       {
15         "id": "chunk_qncr8Txe-wYvmFjipXgMD",
16         "type": "page",
17         "content": "CHASE JPMorgan Chase Bank, N.A. P O Box 659754...",
18         "metadata": { "pageRange": { "start": 1, "end": 1 } },
19         "blocks": [
20           {
21             "object": "block",
22             "id": "block_WNoJ0WbMj4pRW9MpMpUox",
23             "type": "text",
24             "content": "CHASE JPMorgan Chase Bank, N.A. P O Box 659754 San Antonio, TX 78265 - 9754",
25             "details": {},
26             "metadata": { "page": { "number": 1, "width": 612, "height": 792 } },
27             "polygon": [
28               { "x": 56.873, "y": 35.374 },
29               { "x": 162.173, "y": 35.215 },
30               { "x": 162.245, "y": 81.158 },
31               { "x": 56.938, "y": 81.317 }
32             ],
33             "boundingBox": { "left": 56.873, "top": 35.215, "right": 162.245, "bottom": 81.317 }
34           }
35         ]
36       }
37     ]
38   },
39   "outputUrl": null,
40   "metrics": { "pageCount": 7, "processingTimeMs": 8293 },
41   "config": { "engine": "parse_performance", "engineVersion": "1.0.0" },
42   "usage": { "credits": 14 }
43 }

Key fields

Field	What it contains
`id`	Parse run ID you can use to fetch results later.
`file.id`	The Extend file identifier tied to this parse run.
`status`	Processing state for the parse run (e.g., `PROCESSED`).
`metrics.pageCount`	Total pages processed in the document.
`metrics.processingTimeMs`	End-to-end parsing time in milliseconds.
`usage.credits`	Credits consumed by this run.
`output.chunks`	Parsed content units (page, section, or document-level based on config).
`output.chunks[].content`	Formatted content string for the chunk.
`output.chunks[].blocks`	Block array with structured elements and their layout data.

For full request/response details, see the Create Parse Run API reference.

Using Parsed Output

You can access the formatted content of each chunk or work with individual blocks for more control.

1 // Access the formatted content of each chunk
2 response.output.chunks.forEach((chunk, index) => {
3   console.log(`Page ${index + 1}:`, chunk.content);
4 });
5 
6 // Or work with individual blocks for more control
7 response.output.chunks.forEach(chunk => {
8   chunk.blocks.forEach(block => {
9     console.log(`${block.type}:`, block.content);
10   });
11 });

For a deeper guide on how to use the output of this endpoint, see Response Format.

Using Extend Studio

You can also use Extend Studio UI to upload your document, configure the parser, and view the code to copy.

Parser view code

Here, you’re also able to go to the Config tab to edit the parser configuration and copy the JSON config.

Parser config

Next steps

Configuration

Customize chunking, output format, and block options

Recipes

Ready-to-use configs for RAG, legal docs, and more

Response Format

Extract tables, figures, and spatial data

Best Practices

Optimize for speed or accuracy

Error Codes

Handle errors and troubleshoot issues

API Reference

Full request and response schema