Parse File (Async)

Parse files to get cleaned, chunked target content (e.g. markdown). The Parse endpoint allows you to convert documents into structured, machine-readable formats with fine-grained control over the parsing process. This endpoint is ideal for extracting cleaned document content to be used as context for downstream processing, e.g. RAG pipelines, custom ingestion pipelines, embeddings classification, etc. For more details, see the [Parse File guide](https://docs.extend.ai/2026-02-09/product/parsing/parse). See [Async Processing](https://docs.extend.ai/2026-02-09/developers/async-processing) for a full guide on polling helpers and webhooks. ## Polling with the SDK The SDK provides a `createAndPoll` / `create_and_poll` method that handles polling automatically, returning when the run reaches a terminal state (`PROCESSED` or `FAILED`): <Tabs> <Tab title="TypeScript"> ```typescript const result = await client.parseRuns.createAndPoll({ file: { url: "https://..." } }); // Returns when the run reaches a terminal state console.log(result.output); ``` </Tab> <Tab title="Python"> ```python result = client.parse_runs.create_and_poll( file={"url": "https://..."} ) # Returns when the run reaches a terminal state print(result.output) ``` </Tab> <Tab title="Java"> ```java var result = client.parseRuns().createAndPoll(ParseRunCreateRequest.builder() .file(FileInput.builder().url("https://...").build()) .build()); // Returns when the run reaches a terminal state System.out.println(result.getOutput()); ``` </Tab> </Tabs>

Authentication

AuthorizationBearer

Bearer authentication of the form Bearer <token>, where token is your auth token.

Headers

x-extend-api-version"2026-02-09"Optional

API version to use for the request. If you’re using an SDK, you can ignore this parameter. If you are not using an SDK and do not specify a version, you will either receive a 400 Bad Request or be set to a previous legacy version. See API Versioning for more details.

Request

This endpoint expects an object.
fileobjectRequired
The file to be parsed. Files can be provided as a URL or an Extend file ID.
configobjectOptional
Configuration options for the parsing process. Defaults depend on the selected parser engine and version.
metadatamap from strings to anyOptional
An optional object that can be passed in to identify the run. It will be returned back to you in the response and webhooks. Maximum size is 10KB. To categorize runs for billing and usage tracking, include `extend:usage_tags` with an array of string values (e.g., `{"extend:usage_tags": ["production", "team-eng", "customer-123"]}`). Tags must contain only alphanumeric characters, hyphens, and underscores; any special characters will be automatically removed.

Response

Successfully parsed file
objectenum

The type of object. Will always be "parse_run".

Allowed values:
idstring

A unique identifier for the parse run.

Example: "pr_xK9mLPqRtN3vS8wF5hB2cQ"

fileobject

The file that was parsed. This file can be used as a parameter for other Extend endpoints, such as POST /workflow_runs.

statusenum
The status of the parse run: * `"PROCESSING"` - The file is still being processed * `"PROCESSED"` - The file was successfully processed * `"FAILED"` - The processing failed (see `failureReason` for details)
Allowed values:
failureReasonstring or null
The reason for failure. **Availability:** Present when `status` is `"FAILED"`. Possible values include: * `UNABLE_TO_DOWNLOAD_FILE` - The file could not be downloaded from the provided URL * `FILE_TYPE_NOT_SUPPORTED` - The file type is not supported for parsing * `FILE_SIZE_TOO_LARGE` - The file exceeds the maximum allowed size * `CORRUPT_FILE` - The file appears to be corrupted or malformed * `OCR_ERROR` - An error occurred during optical character recognition * `PASSWORD_PROTECTED_FILE` - The file is password protected and cannot be processed * `FAILED_TO_CONVERT_TO_PDF` - The file could not be converted to PDF for processing * `FAILED_TO_CONVERT_TO_JPEG` - The file could not be converted to JPEG for processing * `FAILED_TO_GENERATE_TARGET_FORMAT` - The output could not be generated in the requested format * `CHUNKING_ERROR` - An error occurred while chunking the document * `INTERNAL_ERROR` - An unexpected internal error occurred * `INVALID_CONFIG_OPTIONS` - The provided configuration options are invalid * `OUT_OF_CREDITS` - Insufficient credits to process the file **Note:** Additional failure reasons may be added in the future. Your integration should handle unknown values gracefully.
failureMessagestring or null

A human-readable description of the failure.

Availability: Present when status is "FAILED".

outputobject or null

The parse run output.

Availability: Present when status is "PROCESSED" and the request was made without the responseType=url query parameter. Contains the parsed chunks.

outputUrlstring or null

A presigned URL to download the parse run output as a JSON file. The object shape is the same as the output field. Expires after 15 minutes.

Availability: Present when status is "PROCESSED" and the request was made with responseType=url query parameter.

metricsobject or null

Metrics about the parsing process.

Availability: Present when status is "PROCESSED".

configobject
The configuration used for the parsing process, including any default values that were applied.
usageobject or null

Usage credits consumed by this run.

Availability: Present when status is "PROCESSED", the run was created after October 7, 2025, and the customer is on the current billing system.

Errors