Extract File (Async)

Extract structured data from a file using an existing extractor or an inline configuration.

The request returns immediately with a PROCESSING status. Use webhooks or poll the Get Extract Run endpoint for results. See Async Processing for a full guide on polling helpers and webhooks.

Polling with the SDK

The SDK provides a createAndPoll / create_and_poll method that handles polling automatically, returning when the run reaches a terminal state (PROCESSED, FAILED, or CANCELLED):

TypeScript

Python

Java

1 const result = await client.extractRuns.createAndPoll({
2   extractor: { id: "ex_abc123" },
3   file: { url: "https://..." }
4 });
5 // Returns when the run reaches a terminal state
6 console.log(result.output?.value);

Extract structured data from a file using an existing extractor or an inline configuration. The request returns immediately with a `PROCESSING` status. Use webhooks or poll the [Get Extract Run](https://docs.extend.ai/2026-02-09/developers/api-reference/endpoints/extract/get-extract-run) endpoint for results. See [Async Processing](https://docs.extend.ai/2026-02-09/developers/async-processing) for a full guide on polling helpers and webhooks. ## Polling with the SDK The SDK provides a `createAndPoll` / `create_and_poll` method that handles polling automatically, returning when the run reaches a terminal state (`PROCESSED`, `FAILED`, or `CANCELLED`): <Tabs> <Tab title="TypeScript"> ```typescript const result = await client.extractRuns.createAndPoll({ extractor: { id: "ex_abc123" }, file: { url: "https://..." } }); // Returns when the run reaches a terminal state console.log(result.output?.value); ``` </Tab> <Tab title="Python"> ```python result = client.extract_runs.create_and_poll( extractor={"id": "ex_abc123"}, file={"url": "https://..."} ) # Returns when the run reaches a terminal state print(result.output.value) ``` </Tab> <Tab title="Java"> ```java var result = client.extractRuns().createAndPoll(ExtractRunCreateRequest.builder() .extractor(ExtractorInput.builder().id("ex_abc123").build()) .file(FileInput.builder().url("https://...").build()) .build()); // Returns when the run reaches a terminal state System.out.println(result.getOutput().getValue()); ``` </Tab> </Tabs>

Authentication

AuthorizationBearer

Bearer authentication of the form Bearer <token>, where token is your auth token.

Request

This endpoint expects an object.

fileobjectRequired

The file to be extracted from. Files can be provided as a URL, Extend file ID, or raw text.

extractorobjectOptional

Reference to an existing extractor. One of extractor or config must be provided.

configobjectOptional

Inline extract configuration. One of extractor or config must be provided.

priorityintegerOptional1-100Defaults to 50

An optional value used to determine the relative order of runs when rate limiting is in effect. Lower values will be prioritized before higher values.

metadatamap from strings to anyOptional

An optional object that can be passed in to identify the run. It will be returned back to you in the response and webhooks. Maximum size is 10KB.

To categorize runs for billing and usage tracking, include extend:usage_tags with an array of string values (e.g., {"extend:usage_tags": ["production", "team-eng", "customer-123"]}). Tags must contain only alphanumeric characters, hyphens, and underscores; any special characters will be automatically removed.

An optional object that can be passed in to identify the run. It will be returned back to you in the response and webhooks. Maximum size is 10KB. To categorize runs for billing and usage tracking, include `extend:usage_tags` with an array of string values (e.g., `{"extend:usage_tags": ["production", "team-eng", "customer-123"]}`). Tags must contain only alphanumeric characters, hyphens, and underscores; any special characters will be automatically removed.

Response

Extract completed successfully

objectenum

The type of object. Will always be "extract_run".

Allowed values:

idstring

The unique identifier for this extract run.

Example: "exr_Xj8mK2pL9nR4vT7qY5wZ"

extractorobject or null

The extractor that was used for this run.

Availability: Present when an extractor reference was provided. Not present when using inline config.

extractorVersionobject or null

The version of the extractor that was used for this run.

Availability: Present when an extractor reference was provided. Not present when using inline config.

statusenum

The status of a processor run (extract, classify, or split):

"PROCESSING" - The run is in progress
"PROCESSED" - The run completed successfully
"FAILED" - The run failed
"CANCELLED" - The run was cancelled

The status of a processor run (extract, classify, or split): * `"PROCESSING"` - The run is in progress * `"PROCESSED"` - The run completed successfully * `"FAILED"` - The run failed * `"CANCELLED"` - The run was cancelled

Allowed values:

outputobject or map from strings to objects or null

The final output, either reviewed or initial. This is a union of two possible shapes:

JSON Schema output: The current output format, returned for runs created with a JSON Schema config.
Legacy output: A legacy output format from a previous API version. This shape is only returned for runs that were originally created with a legacy config.

Availability: Present when status is "PROCESSED".

The final output, either reviewed or initial. This is a union of two possible shapes: - **[JSON Schema output](https://docs.extend.ai/2026-02-09/product/extraction/output-types):** The current output format, returned for runs created with a JSON Schema config. - **[Legacy output](https://docs.extend.ai/2025-04-21/product/legacy/output-type-legacy):** A legacy output format from a previous API version. This shape is only returned for runs that were originally created with a legacy config. **Availability:** Present when `status` is `"PROCESSED"`.

initialOutputobject or map from strings to objects or null

The initial output from the extract run, before any review edits.

Availability: Present when reviewed is true.

reviewedOutputobject or map from strings to objects or null

The output after human review.

Availability: Present when reviewed is true.

failureReasonstring or null

The reason for failure.

Availability: Present when status is "FAILED".

Possible values include:

ABORTED - The run was aborted by the user
INTERNAL_ERROR - An unexpected internal error occurred
FAILED_TO_PROCESS_FILE - Failed to process the file (e.g., OCR failure, file access issues)
INVALID_PROCESSOR - The processor configuration is invalid
INVALID_CONFIGURATION - The provided configuration is incompatible with the selected model
PARSING_ERROR - Failed to parse the extraction output
PRE_PROCESSING_FAILURE - An error occurred during preprocessing (e.g., chunking)
POST_PROCESSING_FAILURE - An error occurred during postprocessing
OUT_OF_CREDITS - Insufficient credits to run the extraction

Note: Additional failure reasons may be added in the future. Your integration should handle unknown values gracefully.

The reason for failure. **Availability:** Present when `status` is `"FAILED"`. Possible values include: * `ABORTED` - The run was aborted by the user * `INTERNAL_ERROR` - An unexpected internal error occurred * `FAILED_TO_PROCESS_FILE` - Failed to process the file (e.g., OCR failure, file access issues) * `INVALID_PROCESSOR` - The processor configuration is invalid * `INVALID_CONFIGURATION` - The provided configuration is incompatible with the selected model * `PARSING_ERROR` - Failed to parse the extraction output * `PRE_PROCESSING_FAILURE` - An error occurred during preprocessing (e.g., chunking) * `POST_PROCESSING_FAILURE` - An error occurred during postprocessing * `OUT_OF_CREDITS` - Insufficient credits to run the extraction **Note:** Additional failure reasons may be added in the future. Your integration should handle unknown values gracefully.

failureMessagestring or null

A detailed message about the failure.

Availability: Present when status is "FAILED".

metadatamap from strings to any or null

Any metadata that was provided when creating the extract run.

Availability: Present when metadata was provided during creation.

reviewedboolean

Indicates whether the run has been reviewed by a human.

editedboolean

Indicates whether the run results have been edited during review.

editsmap from strings to objects or null

Details of edits made during review.

Availability: Present when edited is true.

configobject

The configuration used for this extract run. This is a union of two possible shapes:

JSON Schema config: The current config format. All runs created through this API version use this shape.
Legacy config: A fields-array config from a previous API version. This shape is only returned when retrieving runs that were originally created with the legacy format. This API version does not support creating runs with legacy configs.

The configuration used for this extract run. This is a union of two possible shapes: - **[JSON Schema config](https://docs.extend.ai/2026-02-09/product/extraction/schema):** The current config format. All runs created through this API version use this shape. - **[Legacy config](https://docs.extend.ai/2025-04-21/product/legacy/legacy-schema):** A fields-array config from a previous API version. This shape is only returned when retrieving runs that were originally created with the legacy format. This API version does not support creating runs with legacy configs.

fileobject

The file that was processed.

parseRunIdstring or null

The ID of the parse run that was used for this extract run.

Availability: Present when a parse run was created.

dashboardUrlstring

The URL to view the extract run in the Extend dashboard.

usageobject or null

Usage credits consumed by this run.

Availability: Present when status is "PROCESSED".

createdAtstringformat: "date-time"

The time (in UTC) at which the object was created. Will follow the RFC 3339 format.

Example: "2024-03-21T16:45:00Z"

updatedAtstringformat: "date-time"

The time (in UTC) at which the object was last updated. Will follow the RFC 3339 format.

Example: "2024-03-21T16:45:00Z"

Errors

400

Bad Request Error

401

Unauthorized Error

402

Payment Required Error

403

Forbidden Error

404

Not Found Error

422

Unprocessable Entity Error

429

Too Many Requests Error

500

Internal Server Error

The reason for failure.

Availability: Present when status is "FAILED".

Possible values include:

ABORTED - The run was aborted by the user
INTERNAL_ERROR - An unexpected internal error occurred
FAILED_TO_PROCESS_FILE - Failed to process the file (e.g., OCR failure, file access issues)
INVALID_PROCESSOR - The processor configuration is invalid
INVALID_CONFIGURATION - The provided configuration is incompatible with the selected model
PARSING_ERROR - Failed to parse the extraction output
PRE_PROCESSING_FAILURE - An error occurred during preprocessing (e.g., chunking)
POST_PROCESSING_FAILURE - An error occurred during postprocessing
OUT_OF_CREDITS - Insufficient credits to run the extraction

Note: Additional failure reasons may be added in the future. Your integration should handle unknown values gracefully.

1	import { ExtendClient } from "extend-ai";
2
3	async function main() {
4	const client = new ExtendClient({
5	token: "YOUR_TOKEN_HERE",
6	extendApiVersion: "2026-02-09",
7	});
8	await client.extractRuns.create({
9	file: {
10	url: "https://example.com/invoice.pdf",
11	},
12	extractor: {
13	id: "ex_1234567890",
14	},
15	});
16	}
17	main();

1	{
2	"object": "extract_run",
3	"id": "exr_Xj8mK2pL9nR4vT7qY5wZ",
4	"extractor": {
5	"object": "extractor",
6	"id": "ex_Xj8mK2pL9nR4vT7qY5wZ",
7	"name": "Invoice Extractor",
8	"createdAt": "2024-03-21T16:45:00Z",
9	"updatedAt": "2024-03-21T16:45:00Z"
10	},
11	"extractorVersion": {
12	"object": "extractor_version",
13	"id": "exv_xK9mLPqRtN3vS8wF5hB2cQ",
14	"description": "Updated extraction fields for new invoice format",
15	"version": "draft",
16	"extractorId": "ex_Xj8mK2pL9nR4vT7qY5wZ",
17	"createdAt": "2024-03-21T16:45:00Z"
18	},
19	"status": "PROCESSING",
20	"output": {
21	"value": {},
22	"metadata": {}
23	},
24	"initialOutput": {
25	"value": {},
26	"metadata": {}
27	},
28	"reviewedOutput": {
29	"value": {},
30	"metadata": {}
31	},
32	"failureReason": "PARSING_ERROR",
33	"failureMessage": "string",
34	"metadata": {},
35	"reviewed": false,
36	"edited": false,
37	"edits": {},
38	"config": {
39	"baseProcessor": "extraction_performance",
40	"baseVersion": "string",
41	"extractionRules": "string",
42	"schema": {},
43	"advancedOptions": {
44	"modelReasoningInsightsEnabled": true,
45	"advancedMultimodalEnabled": true,
46	"citationsEnabled": true,
47	"arrayCitationStrategy": "item",
48	"arrayStrategy": {
49	"type": "large_array_heuristics"
50	},
51	"chunkingOptions": {
52	"chunkingStrategy": "standard",
53	"pageChunkSize": 1,
54	"chunkSelectionStrategy": "intelligent",
55	"customSemanticChunkingRules": "string"
56	},
57	"excelSheetRanges": [
58	{
59	"start": 1,
60	"end": 1
61	}
62	],
63	"excelSheetSelectionStrategy": "intelligent",
64	"pageRanges": [
65	{
66	"start": 1,
67	"end": 10
68	},
69	{
70	"start": 20,
71	"end": 30
72	}
73	],
74	"reviewAgent": {
75	"enabled": true
76	}
77	},
78	"parseConfig": {
79	"target": "markdown",
80	"chunkingStrategy": {
81	"type": "page",
82	"options": {
83	"minCharacters": 500,
84	"maxCharacters": 10000
85	}
86	},
87	"engine": "parse_performance",
88	"engineVersion": "latest",
89	"blockOptions": {
90	"figures": {
91	"enabled": true,
92	"figureImageClippingEnabled": true,
93	"advancedChartExtractionEnabled": false
94	},
95	"tables": {
96	"enabled": true,
97	"targetFormat": "html",
98	"tableHeaderContinuationEnabled": false,
99	"cellBlocksEnabled": false,
100	"agentic": {
101	"enabled": false,
102	"customInstructions": "string"
103	}
104	},
105	"text": {
106	"signatureDetectionEnabled": false,
107	"agentic": {
108	"enabled": false,
109	"customInstructions": "string"
110	}
111	},
112	"keyValue": {
113	"blankFieldFormattingEnabled": false
114	},
115	"barcodes": {
116	"imageClippingEnabled": false,
117	"readingEnabled": false
118	}
119	},
120	"advancedOptions": {
121	"pageRotationEnabled": true,
122	"pageRanges": [
123	{
124	"start": 1,
125	"end": 10
126	},
127	{
128	"start": 20,
129	"end": 30
130	}
131	],
132	"excelParsingMode": "basic",
133	"excelSkipHiddenContent": false,
134	"excelUseRawCellValues": false,
135	"excelSkipCalculation": true,
136	"verticalGroupingThreshold": 1,
137	"returnOcr": {
138	"words": false
139	},
140	"alwaysConvertToPdf": false,
141	"enrichmentFormat": "xml",
142	"imageConversionQuality": "medium"
143	}
144	}
145	},
146	"file": {
147	"object": "file",
148	"id": "file_xK9mLPqRtN3vS8wF5hB2cQ",
149	"name": "Invoices.pdf",
150	"type": "PDF",
151	"parentFileId": "file_Zk9mNP12Qw4yTv8BdR3H",
152	"metadata": {
153	"pageCount": 30,
154	"parentSplit": {
155	"id": "string",
156	"type": "Invoice",
157	"identifier": "other_2_9",
158	"startPage": 1,
159	"endPage": 10
160	}
161	},
162	"createdAt": "2024-03-21T16:45:00Z",
163	"updatedAt": "2024-03-21T16:45:00Z"
164	},
165	"parseRunId": "pr_Xj8mK2pL9nR4vT7qY5wZ",
166	"dashboardUrl": "https://dashboard.extend.ai/runs/exr_Xj8mK2pL9nR4vT7qY5wZ",
167	"usage": {
168	"credits": 10
169	},
170	"createdAt": "2024-03-21T16:45:00Z",
171	"updatedAt": "2024-03-21T16:45:00Z"
172	}

1	const result = await client.extractRuns.createAndPoll({
2	extractor: { id: "ex_abc123" },
3	file: { url: "https://..." }
4	});
5	// Returns when the run reaches a terminal state
6	console.log(result.output?.value);

Polling with the SDK

TypeScript

Python

Java

Authentication

Headers

Request

Response

Errors

Polling with the SDK

TypeScript

Python

Java