Parse File (Async)

Parse files to get cleaned, chunked target content (e.g. markdown).

The Parse endpoint allows you to convert documents into structured, machine-readable formats with fine-grained control over the parsing process. This endpoint is ideal for extracting cleaned document content to be used as context for downstream processing, e.g. RAG pipelines, custom ingestion pipelines, embeddings classification, etc.

For more details, see the Parse File guide. See Async Processing for a full guide on polling helpers and webhooks.

Polling with the SDK

The SDK provides a createAndPoll / create_and_poll method that handles polling automatically, returning when the run reaches a terminal state (PROCESSED or FAILED):

TypeScript

Python

Java

1 const result = await client.parseRuns.createAndPoll({
2   file: { url: "https://..." }
3 });
4 // Returns when the run reaches a terminal state
5 console.log(result.output);

Parse files to get cleaned, chunked target content (e.g. markdown). The Parse endpoint allows you to convert documents into structured, machine-readable formats with fine-grained control over the parsing process. This endpoint is ideal for extracting cleaned document content to be used as context for downstream processing, e.g. RAG pipelines, custom ingestion pipelines, embeddings classification, etc. For more details, see the [Parse File guide](https://docs.extend.ai/2026-02-09/product/parsing/parse). See [Async Processing](https://docs.extend.ai/2026-02-09/developers/async-processing) for a full guide on polling helpers and webhooks. ## Polling with the SDK The SDK provides a `createAndPoll` / `create_and_poll` method that handles polling automatically, returning when the run reaches a terminal state (`PROCESSED` or `FAILED`): <Tabs> <Tab title="TypeScript"> ```typescript const result = await client.parseRuns.createAndPoll({ file: { url: "https://..." } }); // Returns when the run reaches a terminal state console.log(result.output); ``` </Tab> <Tab title="Python"> ```python result = client.parse_runs.create_and_poll( file={"url": "https://..."} ) # Returns when the run reaches a terminal state print(result.output) ``` </Tab> <Tab title="Java"> ```java var result = client.parseRuns().createAndPoll(ParseRunCreateRequest.builder() .file(FileInput.builder().url("https://...").build()) .build()); // Returns when the run reaches a terminal state System.out.println(result.getOutput()); ``` </Tab> </Tabs>

Authentication

AuthorizationBearer

Bearer authentication of the form Bearer <token>, where token is your auth token.

Request

This endpoint expects an object.

fileobjectRequired

The file to be parsed. Files can be provided as a URL or an Extend file ID.

configobjectOptional

Configuration options for the parsing process. Defaults depend on the selected parser engine and version.

metadatamap from strings to anyOptional

An optional object that can be passed in to identify the run. It will be returned back to you in the response and webhooks. Maximum size is 10KB.

To categorize runs for billing and usage tracking, include extend:usage_tags with an array of string values (e.g., {"extend:usage_tags": ["production", "team-eng", "customer-123"]}). Tags must contain only alphanumeric characters, hyphens, and underscores; any special characters will be automatically removed.

An optional object that can be passed in to identify the run. It will be returned back to you in the response and webhooks. Maximum size is 10KB. To categorize runs for billing and usage tracking, include `extend:usage_tags` with an array of string values (e.g., `{"extend:usage_tags": ["production", "team-eng", "customer-123"]}`). Tags must contain only alphanumeric characters, hyphens, and underscores; any special characters will be automatically removed.

Response

Successfully parsed file

objectenum

The type of object. Will always be "parse_run".

Allowed values:

idstring

A unique identifier for the parse run.

Example: "pr_xK9mLPqRtN3vS8wF5hB2cQ"

fileobject

The file that was parsed. This file can be used as a parameter for other Extend endpoints, such as POST /workflow_runs.

statusenum

The status of the parse run:

"PROCESSING" - The file is still being processed
"PROCESSED" - The file was successfully processed
"FAILED" - The processing failed (see failureReason for details)

The status of the parse run: * `"PROCESSING"` - The file is still being processed * `"PROCESSED"` - The file was successfully processed * `"FAILED"` - The processing failed (see `failureReason` for details)

Allowed values:

failureReasonstring or null

The reason for failure.

Availability: Present when status is "FAILED".

Possible values include:

UNABLE_TO_DOWNLOAD_FILE - The file could not be downloaded from the provided URL
FILE_TYPE_NOT_SUPPORTED - The file type is not supported for parsing
FILE_SIZE_TOO_LARGE - The file exceeds the maximum allowed size
CORRUPT_FILE - The file appears to be corrupted or malformed
OCR_ERROR - An error occurred during optical character recognition
PASSWORD_PROTECTED_FILE - The file is password protected and cannot be processed
FAILED_TO_CONVERT_TO_PDF - The file could not be converted to PDF for processing
FAILED_TO_CONVERT_TO_JPEG - The file could not be converted to JPEG for processing
FAILED_TO_GENERATE_TARGET_FORMAT - The output could not be generated in the requested format
CHUNKING_ERROR - An error occurred while chunking the document
INTERNAL_ERROR - An unexpected internal error occurred
INVALID_CONFIG_OPTIONS - The provided configuration options are invalid
OUT_OF_CREDITS - Insufficient credits to process the file

Note: Additional failure reasons may be added in the future. Your integration should handle unknown values gracefully.

The reason for failure. **Availability:** Present when `status` is `"FAILED"`. Possible values include: * `UNABLE_TO_DOWNLOAD_FILE` - The file could not be downloaded from the provided URL * `FILE_TYPE_NOT_SUPPORTED` - The file type is not supported for parsing * `FILE_SIZE_TOO_LARGE` - The file exceeds the maximum allowed size * `CORRUPT_FILE` - The file appears to be corrupted or malformed * `OCR_ERROR` - An error occurred during optical character recognition * `PASSWORD_PROTECTED_FILE` - The file is password protected and cannot be processed * `FAILED_TO_CONVERT_TO_PDF` - The file could not be converted to PDF for processing * `FAILED_TO_CONVERT_TO_JPEG` - The file could not be converted to JPEG for processing * `FAILED_TO_GENERATE_TARGET_FORMAT` - The output could not be generated in the requested format * `CHUNKING_ERROR` - An error occurred while chunking the document * `INTERNAL_ERROR` - An unexpected internal error occurred * `INVALID_CONFIG_OPTIONS` - The provided configuration options are invalid * `OUT_OF_CREDITS` - Insufficient credits to process the file **Note:** Additional failure reasons may be added in the future. Your integration should handle unknown values gracefully.

failureMessagestring or null

A human-readable description of the failure.

Availability: Present when status is "FAILED".

outputobject or null

The parse run output.

Availability: Present when status is "PROCESSED" and the request was made without the responseType=url query parameter. Contains the parsed chunks.

outputUrlstring or null

A presigned URL to download the parse run output as a JSON file. The object shape is the same as the output field. Expires after 15 minutes.

Availability: Present when status is "PROCESSED" and the request was made with responseType=url query parameter.

metricsobject or null

Metrics about the parsing process.

Availability: Present when status is "PROCESSED".

configobject

The configuration used for the parsing process, including any default values that were applied.

usageobject or null

Usage credits consumed by this run.

Availability: Present when status is "PROCESSED", the run was created after October 7, 2025, and the customer is on the current billing system.

Errors

The reason for failure.

Availability: Present when status is "FAILED".

Possible values include:

UNABLE_TO_DOWNLOAD_FILE - The file could not be downloaded from the provided URL
FILE_TYPE_NOT_SUPPORTED - The file type is not supported for parsing
FILE_SIZE_TOO_LARGE - The file exceeds the maximum allowed size
CORRUPT_FILE - The file appears to be corrupted or malformed
OCR_ERROR - An error occurred during optical character recognition
PASSWORD_PROTECTED_FILE - The file is password protected and cannot be processed
FAILED_TO_CONVERT_TO_PDF - The file could not be converted to PDF for processing
FAILED_TO_CONVERT_TO_JPEG - The file could not be converted to JPEG for processing
FAILED_TO_GENERATE_TARGET_FORMAT - The output could not be generated in the requested format
CHUNKING_ERROR - An error occurred while chunking the document
INTERNAL_ERROR - An unexpected internal error occurred
INVALID_CONFIG_OPTIONS - The provided configuration options are invalid
OUT_OF_CREDITS - Insufficient credits to process the file

Note: Additional failure reasons may be added in the future. Your integration should handle unknown values gracefully.

1	import { ExtendClient, ExtendEnvironment } from "extend-ai";
2
3	async function main() {
4	const client = new ExtendClient({
5	environment: ExtendEnvironment.Production,
6	token: "YOUR_TOKEN_HERE",
7	});
8	await client.parseRuns.create({
9	file: {
10	url: "https://example.com/bank_statement.pdf",
11	name: "bank_statement.pdf",
12	},
13	});
14	}
15	main();

1	{
2	"object": "parse_run",
3	"id": "pr_xK9mLPqRtN3vS8wF5hB2cQ",
4	"file": {
5	"object": "file",
6	"id": "file_xK9mLPqRtN3vS8wF5hB2cQ",
7	"name": "Invoices.pdf",
8	"type": "PDF",
9	"parentFileId": "file_Zk9mNP12Qw4yTv8BdR3H",
10	"metadata": {
11	"pageCount": 30,
12	"parentSplit": {
13	"id": "string",
14	"type": "Invoice",
15	"identifier": "other_2_9",
16	"startPage": 1,
17	"endPage": 10
18	}
19	},
20	"createdAt": "2024-03-21T16:45:00Z",
21	"updatedAt": "2024-03-21T16:45:00Z"
22	},
23	"status": "PROCESSING",
24	"failureReason": "FILE_TYPE_NOT_SUPPORTED",
25	"failureMessage": "File type not supported for parsing.",
26	"output": {
27	"chunks": [
28	{
29	"object": "chunk",
30	"type": "page",
31	"content": "This is the content of the chunk.",
32	"metadata": {
33	"pageRange": {
34	"start": 1,
35	"end": 1
36	}
37	},
38	"blocks": [
39	{
40	"object": "block",
41	"id": "string",
42	"type": "text",
43	"content": "string",
44	"details": {},
45	"metadata": {
46	"page": {
47	"number": 1,
48	"width": 1.1,
49	"height": 1.1
50	}
51	},
52	"polygon": [
53	{
54	"x": 10,
55	"y": 20
56	}
57	],
58	"boundingBox": {
59	"left": 10,
60	"top": 10,
61	"right": 20,
62	"bottom": 20
63	},
64	"parentBlockId": "string",
65	"children": [
66	null
67	]
68	}
69	]
70	}
71	],
72	"ocr": {
73	"words": [
74	{
75	"content": "string",
76	"boundingBox": {
77	"left": 10,
78	"top": 10,
79	"right": 20,
80	"bottom": 20
81	},
82	"confidence": 1.1,
83	"pageNumber": 1.1
84	}
85	]
86	}
87	},
88	"outputUrl": "https://...",
89	"metrics": {
90	"processingTimeMs": 1234,
91	"pageCount": 5
92	},
93	"config": {
94	"target": "markdown",
95	"chunkingStrategy": {
96	"type": "page",
97	"options": {
98	"minCharacters": 500,
99	"maxCharacters": 10000
100	}
101	},
102	"engine": "parse_performance",
103	"engineVersion": "latest",
104	"blockOptions": {
105	"figures": {
106	"enabled": true,
107	"figureImageClippingEnabled": true
108	},
109	"tables": {
110	"targetFormat": "html",
111	"tableHeaderContinuationEnabled": false,
112	"cellBlocksEnabled": false,
113	"agentic": {
114	"enabled": false,
115	"customInstructions": "string"
116	},
117	"enabled": true
118	},
119	"text": {
120	"signatureDetectionEnabled": false,
121	"agentic": {
122	"enabled": false,
123	"customInstructions": "string"
124	}
125	}
126	},
127	"advancedOptions": {
128	"pageRotationEnabled": true,
129	"pageRanges": [
130	{
131	"start": 1,
132	"end": 10
133	},
134	{
135	"start": 20,
136	"end": 30
137	}
138	],
139	"excelParsingMode": "basic",
140	"excelSkipHiddenContent": false,
141	"verticalGroupingThreshold": 1,
142	"returnOcr": {
143	"words": false
144	}
145	}
146	},
147	"usage": {
148	"credits": 10
149	}
150	}

1	const result = await client.parseRuns.createAndPoll({
2	file: { url: "https://..." }
3	});
4	// Returns when the run reaches a terminal state
5	console.log(result.output);

Polling with the SDK

TypeScript

Python

Java

Authentication

Headers

Request

Response

Errors

Polling with the SDK

TypeScript

Python

Java