Quick Start

The Parse endpoint converts documents into structured, LLM-ready formats. Use it to extract clean document content for downstream processing such as RAG pipelines, custom ingestion workflows, document extraction tasks, Agents, etc.

This quickstart will get you up and running with the Parse API in under 5 minutes to extract structured content that can be passed into your LLM/Agent as context or further processed.

Using Workflows? Parser settings can also be configured directly on the Parse Step in your workflow. This allows you to set explicit parsing behavior for all documents processed by that workflow.


What we’re going to parse

We’ll use a bank statement to demonstrate the Parse API.

Bank statement page 1

Feel free to use this document to follow along with the quickstart!

Using the Parse API

Choose your preferred language to get started. If you’re using an SDK, see installation instructions. For raw REST calls, you can use the built-in fetch API or a library like requests for Python.

Replace <YOUR_API_KEY> with your actual key, available on the Developers page, then run one of the examples below to parse the document.

1import { ExtendClient } from "extend-ai";
2
3const client = new ExtendClient({
4 token: "<YOUR_API_KEY>",
5});
6
7// client.parse() is for testing, we recommend using webhooks or polling with client.parseRuns.createAndPoll() in production
8const response = await client.parse({
9 file: {
10 name: "bank_statement.pdf",
11 url: "https://extend-public-files.s3.us-east-2.amazonaws.com/bank_statement_example.pdf",
12 }
13});
14
15console.log("Parsed content:", response);

Each example sends a document to the Parse API and returns structured content split into page-level chunks.

Note: this doesn’t set any configuration options, so the parser will use the default settings. For configuration details, see Configuration Options.

For large documents with long processing times, consider using webhooks to avoid unnecessary polling requests. See Webhooks for setup instructions.

Example response (truncated)

After you run the code snippet above, you’ll see a response like this. This example response is truncated for brevity. The response is organized into output.chunks, which in this case are page-level units. Each chunk includes a formatted content string for the full page and a blocks array for block-level elements (like text, tables, and figures) with metadata and spatial data.

1{
2 "object": "parse_run",
3 "id": "pr_3f1j6I1gsw5k96xFiCnkM",
4 "file": {
5 "object": "file",
6 "id": "file_GzKUy0VDhHscv7tweODYb",
7 "name": "bank_statement.pdf"
8 },
9 "status": "PROCESSED",
10 "failureReason": null,
11 "failureMessage": null,
12 "output": {
13 "chunks": [
14 {
15 "id": "chunk_qncr8Txe-wYvmFjipXgMD",
16 "type": "page",
17 "content": "CHASE JPMorgan Chase Bank, N.A. P O Box 659754...",
18 "metadata": { "pageRange": { "start": 1, "end": 1 } },
19 "blocks": [
20 {
21 "object": "block",
22 "id": "block_WNoJ0WbMj4pRW9MpMpUox",
23 "type": "text",
24 "content": "CHASE JPMorgan Chase Bank, N.A. P O Box 659754 San Antonio, TX 78265 - 9754",
25 "details": {},
26 "metadata": { "page": { "number": 1, "width": 612, "height": 792 } },
27 "polygon": [
28 { "x": 56.873, "y": 35.374 },
29 { "x": 162.173, "y": 35.215 },
30 { "x": 162.245, "y": 81.158 },
31 { "x": 56.938, "y": 81.317 }
32 ],
33 "boundingBox": { "left": 56.873, "top": 35.215, "right": 162.245, "bottom": 81.317 }
34 }
35 ]
36 }
37 ]
38 },
39 "outputUrl": null,
40 "metrics": { "pageCount": 7, "processingTimeMs": 8293 },
41 "config": { "engine": "parse_performance", "engineVersion": "1.0.0" },
42 "usage": { "credits": 14 }
43}

Key fields

FieldWhat it contains
idParse run ID you can use to fetch results later.
file.idThe Extend file identifier tied to this parse run.
statusProcessing state for the parse run (e.g., PROCESSED).
metrics.pageCountTotal pages processed in the document.
metrics.processingTimeMsEnd-to-end parsing time in milliseconds.
usage.creditsCredits consumed by this run.
output.chunksParsed content units (page, section, or document-level based on config).
output.chunks[].contentFormatted content string for the chunk.
output.chunks[].blocksBlock array with structured elements and their layout data.

For full request/response details, see the Create Parse Run API reference.

Using Parsed Output

You can access the formatted content of each chunk or work with individual blocks for more control.

1// Access the formatted content of each chunk
2response.output.chunks.forEach((chunk, index) => {
3 console.log(`Page ${index + 1}:`, chunk.content);
4});
5
6// Or work with individual blocks for more control
7response.output.chunks.forEach(chunk => {
8 chunk.blocks.forEach(block => {
9 console.log(`${block.type}:`, block.content);
10 });
11});

For a deeper guide on how to use the output of this endpoint, see Response Format.


Using Extend Studio

You can also use Extend Studio UI to upload your document, configure the parser, and view the code to copy.

Parser view code

Here, you’re also able to go to the Config tab to edit the parser configuration and copy the JSON config.

Parser config


Next steps