Parse File (Async)
Parse File (Async)
Parse File (Async)
Parse files to get cleaned, chunked target content (e.g. markdown).
The Parse endpoint allows you to convert documents into structured, machine-readable formats with fine-grained control over the parsing process. This endpoint is ideal for extracting cleaned document content to be used as context for downstream processing, e.g. RAG pipelines, custom ingestion pipelines, embeddings classification, etc.
For more details, see the Parse File guide. See Async Processing for a full guide on polling helpers and webhooks.
The SDK provides a createAndPoll / create_and_poll method that handles polling automatically, returning when the run reaches a terminal state (PROCESSED or FAILED):
Bearer authentication of the form Bearer <token>, where token is your auth token.
The type of object. Will always be "parse_run".
A unique identifier for the parse run.
Example: "pr_xK9mLPqRtN3vS8wF5hB2cQ"
The file that was parsed. This file can be used as a parameter for other Extend endpoints, such as POST /workflow_runs. May be null for batch parse runs where file ingestion failed.
The status of the parse run:
"PENDING" - The run has been created and is waiting to be processed. Only applies to runs created via POST /parse_runs/batch."PROCESSING" - The file is still being processed"PROCESSED" - The file was successfully processed"FAILED" - The processing failed (see failureReason for details)The reason for failure.
Availability: Present when status is "FAILED".
Possible values include:
UNABLE_TO_DOWNLOAD_FILE - The file could not be downloaded from the provided URLFILE_TYPE_NOT_SUPPORTED - The file type is not supported for parsingFILE_SIZE_TOO_LARGE - The file exceeds the maximum allowed sizeCORRUPT_FILE - The file appears to be corrupted or malformedOCR_ERROR - An error occurred during optical character recognitionPASSWORD_PROTECTED_FILE - The file is password protected and cannot be processedFAILED_TO_CONVERT_TO_PDF - The file could not be converted to PDF for processingFAILED_TO_CONVERT_TO_JPEG - The file could not be converted to JPEG for processingFAILED_TO_GENERATE_TARGET_FORMAT - The output could not be generated in the requested formatCHUNKING_ERROR - An error occurred while chunking the documentINTERNAL_ERROR - An unexpected internal error occurredINVALID_CONFIG_OPTIONS - The provided configuration options are invalidOUT_OF_CREDITS - Insufficient credits to process the fileNote: Additional failure reasons may be added in the future. Your integration should handle unknown values gracefully.
A human-readable description of the failure.
Availability: Present when status is "FAILED".
The parse run output.
Availability: Present when status is "PROCESSED" and the request was made without the responseType=url query parameter. Contains the parsed chunks.
A presigned URL to download the parse run output as a JSON file. The object shape is the same as the output field. Expires after 15 minutes.
Availability: Present when status is "PROCESSED" and the request was made with responseType=url query parameter.
Metrics about the parsing process.
Availability: Present when status is "PROCESSED".
Usage credits consumed by this parse run.
Availability: Present when status is "PROCESSED", the run was created after October 7, 2025, and the customer is on the current billing system.
The ID of the batch this run belongs to, if created via POST /parse_runs/batch.
Availability: Present when the run was submitted as part of a batch.
Example: "bpar_Xj8mK2pL9nR4vT7qY5wZ"
API version to use for the request. If you’re using an SDK, you can ignore this parameter. If you are not using an SDK and do not specify a version, you will either receive a 400 Bad Request or be set to a previous legacy version. See API Versioning for more details.
An optional object that can be passed in to identify the run. It will be returned back to you in the response and webhooks. Maximum size is 10KB.
To categorize runs for billing and usage tracking, include extend:usage_tags with an array of string values (e.g., {"extend:usage_tags": ["production", "team-eng", "customer-123"]}). Tags must contain only alphanumeric characters, hyphens, and underscores; any special characters will be automatically removed.