Quick Start

The Parse endpoint converts documents into structured, LLM-ready formats. Use it to extract clean document content for downstream processing such as RAG pipelines, custom ingestion workflows, document extraction tasks, Agents, etc.

This quickstart will get you up and running with the Parse API in under 5 minutes to extract structured content that can be passed into your LLM/Agent as context or further processed.

What we’re going to parse

We’ll use a bank statement to demonstrate the Parse API.

Feel free to use this document to follow along with the quickstart!

Using the Parse API

Choose your preferred language to get started. If you’re using an SDK, see installation instructions. For raw REST calls, you can use the built-in fetch API (Node.js 18+) or requests in Python.

Replace <YOUR_API_KEY> with your actual key, available on the Developers page, then run one of the examples below to parse the document.

1 import { ExtendClient } from "extend-ai";
2 
3 const client = new ExtendClient({
4   environment: "https://api.extend.ai",
5   token: "<YOUR_API_KEY>",
6 });
7 
8 const response = await client.parse({
9   file: {
10     fileName: "bank_statement.pdf",
11     fileUrl: "https://extend-public-files.s3.us-east-2.amazonaws.com/bank_statement_example.pdf",
12   }
13 });
14 
15 console.log("Parsed content:", response);

Each example sends a document to the Parse API and returns structured content split into page-level chunks.

Note: this doesn’t set any configuration options, so the parser will use the default settings. For configuration details, see Configuration Options.

For high-volume production workloads, use asynchronous parsing. For more details, see Async vs. sync processing.

Example response (truncated)

After you run the code snippet above, you’ll see a response like this. This example response is truncated for brevity. The response is organized into chunks, which in this case are page-level units. Each chunk includes a formatted content string for the full page and a blocks array for block-level elements (like text, tables, and figures) with metadata and spatial data.

1 {
2   "object": "parser_run",
3   "id": "parser_run_3f1j6I1gsw5k96xFiCnkM",
4   "fileId": "file_GzKUy0VDhHscv7tweODYb",
5   "metrics": { "pageCount": 7, "processingTimeMs": 8293 },
6   "status": "PROCESSED",
7   "config": { "engine": "parse_performance", "engineVersion": "1.0.0" },
8   "usage": { "credits": 14 },
9   "chunks": [
10     {
11       "id": "chunk_qncr8Txe-wYvmFjipXgMD",
12       "type": "page",
13       "content": "CHASE JPMorgan Chase Bank, N.A. P O Box 659754...", // ... more content
14       "metadata": { "pageRange": { "start": 1, "end": 1 } },
15       "blocks": [
16         {
17           "object": "block",
18           "id": "block_WNoJ0WbMj4pRW9MpMpUox",
19           "type": "text",
20           "content": "CHASE JPMorgan Chase Bank, N.A. P O Box 659754 San Antonio, TX 78265 - 9754",
21           "details": {},
22           "metadata": { "page": { "number": 1, "width": 612, "height": 792 } },
23           "polygon": [
24             { "x": 56.873, "y": 35.374 },
25             { "x": 162.173, "y": 35.215 },
26             { "x": 162.245, "y": 81.158 },
27             { "x": 56.938, "y": 81.317 }
28           ],
29           "boundingBox": { "left": 56.873, "top": 35.215, "right": 162.245, "bottom": 81.317 }
30         }
31         // ... more blocks
32       ]
33     },
34     { "id": "chunk_O3gS1z8_KaaCdgi5rDz5P", "type": "page", "content": "..." } // ... more content
35     // ... more chunks for remaining pages
36   ]
37 }

Key fields

Field	What it contains
`id`	Parser run ID you can use to fetch results later.
`fileId`	The Extend file identifier tied to this parse run.
`status`	Processing state for the parse run (e.g., `PROCESSED`).
`metrics.pageCount`	Total pages processed in the document.
`metrics.processingTimeMs`	End-to-end parsing time in milliseconds.
`usage.credits`	Credits consumed by this run.
`chunks`	Parsed content units (page, section, or document-level based on config).
`chunks[].content`	Formatted content string for the chunk.
`chunks[].blocks`	Block array with structured elements and their layout data.

For full request/response details, see the Parse File API reference.

Using Parsed Output

You can access the formatted content of each chunk or work with individual blocks for more control.

1 // Access the formatted content of each chunk
2 response.chunks.forEach((chunk, index) => {
3   console.log(`Page ${index + 1}:`, chunk.content);
4 });
5 
6 // Or work with individual blocks for more control
7 response.chunks.forEach(chunk => {
8   chunk.blocks.forEach(block => {
9     console.log(`${block.type}:`, block.content);
10   });
11 });

For a deeper guide on how to use the output of this endpoint, see Response Format.

Using Extend Studio

You can also use Extend Studio UI to upload your document, configure the parser, and view the code to copy.

Parser view code

Here, you’re also able to go to the Config tab to edit the parser configuration and copy the JSON config.

Parser config

Next steps

Configuration

Customize chunking, output format, and block options

Recipes

Ready-to-use configs for RAG, legal docs, and more

Response Format

Extract tables, figures, and spatial data

Best Practices

Optimize for speed or accuracy

Error Codes

Handle errors and troubleshoot issues

API Reference

Full request and response schema

This quickstart will get you up and running with the Parse API in under 5 minutes to extract structured content that can be passed into your LLM/Agent as context or further processed.

What we’re going to parse

We’ll use a bank statement to demonstrate the Parse API.

Feel free to use this document to follow along with the quickstart!

Using the Parse API

Choose your preferred language to get started. If you’re using an SDK, see installation instructions. For raw REST calls, you can use the built-in fetch API (Node.js 18+) or requests in Python.

Replace <YOUR_API_KEY> with your actual key, available on the Developers page, then run one of the examples below to parse the document.

1 import { ExtendClient } from "extend-ai";
2 
3 const client = new ExtendClient({
4   environment: "https://api.extend.ai",
5   token: "<YOUR_API_KEY>",
6 });
7 
8 const response = await client.parse({
9   file: {
10     fileName: "bank_statement.pdf",
11     fileUrl: "https://extend-public-files.s3.us-east-2.amazonaws.com/bank_statement_example.pdf",
12   }
13 });
14 
15 console.log("Parsed content:", response);

Each example sends a document to the Parse API and returns structured content split into page-level chunks.

Note: this doesn’t set any configuration options, so the parser will use the default settings. For configuration details, see Configuration Options.

For high-volume production workloads, use asynchronous parsing. For more details, see Async vs. sync processing.

Example response (truncated)

1 {
2   "object": "parser_run",
3   "id": "parser_run_3f1j6I1gsw5k96xFiCnkM",
4   "fileId": "file_GzKUy0VDhHscv7tweODYb",
5   "metrics": { "pageCount": 7, "processingTimeMs": 8293 },
6   "status": "PROCESSED",
7   "config": { "engine": "parse_performance", "engineVersion": "1.0.0" },
8   "usage": { "credits": 14 },
9   "chunks": [
10     {
11       "id": "chunk_qncr8Txe-wYvmFjipXgMD",
12       "type": "page",
13       "content": "CHASE JPMorgan Chase Bank, N.A. P O Box 659754...", // ... more content
14       "metadata": { "pageRange": { "start": 1, "end": 1 } },
15       "blocks": [
16         {
17           "object": "block",
18           "id": "block_WNoJ0WbMj4pRW9MpMpUox",
19           "type": "text",
20           "content": "CHASE JPMorgan Chase Bank, N.A. P O Box 659754 San Antonio, TX 78265 - 9754",
21           "details": {},
22           "metadata": { "page": { "number": 1, "width": 612, "height": 792 } },
23           "polygon": [
24             { "x": 56.873, "y": 35.374 },
25             { "x": 162.173, "y": 35.215 },
26             { "x": 162.245, "y": 81.158 },
27             { "x": 56.938, "y": 81.317 }
28           ],
29           "boundingBox": { "left": 56.873, "top": 35.215, "right": 162.245, "bottom": 81.317 }
30         }
31         // ... more blocks
32       ]
33     },
34     { "id": "chunk_O3gS1z8_KaaCdgi5rDz5P", "type": "page", "content": "..." } // ... more content
35     // ... more chunks for remaining pages
36   ]
37 }

Key fields

Field	What it contains
`id`	Parser run ID you can use to fetch results later.
`fileId`	The Extend file identifier tied to this parse run.
`status`	Processing state for the parse run (e.g., `PROCESSED`).
`metrics.pageCount`	Total pages processed in the document.
`metrics.processingTimeMs`	End-to-end parsing time in milliseconds.
`usage.credits`	Credits consumed by this run.
`chunks`	Parsed content units (page, section, or document-level based on config).
`chunks[].content`	Formatted content string for the chunk.
`chunks[].blocks`	Block array with structured elements and their layout data.

For full request/response details, see the Parse File API reference.

Using Parsed Output

You can access the formatted content of each chunk or work with individual blocks for more control.

1 // Access the formatted content of each chunk
2 response.chunks.forEach((chunk, index) => {
3   console.log(`Page ${index + 1}:`, chunk.content);
4 });
5 
6 // Or work with individual blocks for more control
7 response.chunks.forEach(chunk => {
8   chunk.blocks.forEach(block => {
9     console.log(`${block.type}:`, block.content);
10   });
11 });

For a deeper guide on how to use the output of this endpoint, see Response Format.

Using Extend Studio

You can also use Extend Studio UI to upload your document, configure the parser, and view the code to copy.

Parser view code

Here, you’re also able to go to the Config tab to edit the parser configuration and copy the JSON config.

Parser config

Next steps

Configuration

Customize chunking, output format, and block options

Recipes

Ready-to-use configs for RAG, legal docs, and more

Response Format

Extract tables, figures, and spatial data

Best Practices

Optimize for speed or accuracy

Error Codes

Handle errors and troubleshoot issues

API Reference

Full request and response schema

1	import { ExtendClient } from "extend-ai";
2
3	const client = new ExtendClient({
4	environment: "https://api.extend.ai",
5	token: "<YOUR_API_KEY>",
6	});
7
8	const response = await client.parse({
9	file: {
10	fileName: "bank_statement.pdf",
11	fileUrl: "https://extend-public-files.s3.us-east-2.amazonaws.com/bank_statement_example.pdf",
12	}
13	});
14
15	console.log("Parsed content:", response);

1	{
2	"object": "parser_run",
3	"id": "parser_run_3f1j6I1gsw5k96xFiCnkM",
4	"fileId": "file_GzKUy0VDhHscv7tweODYb",
5	"metrics": { "pageCount": 7, "processingTimeMs": 8293 },
6	"status": "PROCESSED",
7	"config": { "engine": "parse_performance", "engineVersion": "1.0.0" },
8	"usage": { "credits": 14 },
9	"chunks": [
10	{
11	"id": "chunk_qncr8Txe-wYvmFjipXgMD",
12	"type": "page",
13	"content": "CHASE JPMorgan Chase Bank, N.A. P O Box 659754...", // ... more content
14	"metadata": { "pageRange": { "start": 1, "end": 1 } },
15	"blocks": [
16	{
17	"object": "block",
18	"id": "block_WNoJ0WbMj4pRW9MpMpUox",
19	"type": "text",
20	"content": "CHASE JPMorgan Chase Bank, N.A. P O Box 659754 San Antonio, TX 78265 - 9754",
21	"details": {},
22	"metadata": { "page": { "number": 1, "width": 612, "height": 792 } },
23	"polygon": [
24	{ "x": 56.873, "y": 35.374 },
25	{ "x": 162.173, "y": 35.215 },
26	{ "x": 162.245, "y": 81.158 },
27	{ "x": 56.938, "y": 81.317 }
28	],
29	"boundingBox": { "left": 56.873, "top": 35.215, "right": 162.245, "bottom": 81.317 }
30	}
31	// ... more blocks
32	]
33	},
34	{ "id": "chunk_O3gS1z8_KaaCdgi5rDz5P", "type": "page", "content": "..." } // ... more content
35	// ... more chunks for remaining pages
36	]
37	}

1	// Access the formatted content of each chunk
2	response.chunks.forEach((chunk, index) => {
3	console.log(`Page ${index + 1}:`, chunk.content);
4	});
5
6	// Or work with individual blocks for more control
7	response.chunks.forEach(chunk => {
8	chunk.blocks.forEach(block => {
9	console.log(`${block.type}:`, block.content);
10	});
11	});