Extract File (Sync)
Extract File (Sync)
Authentication
Bearer authentication of the form Bearer <token>, where token is your auth token.
Headers
Request
Reference to an existing extractor. Mutually exclusive with config — provide one or the other, or omit both to have Extend infer a schema from the document.
Inline extract configuration. Mutually exclusive with extractor — provide one or the other, or omit both to have Extend infer a schema from the document.
Response
The type of object. Will always be "extract_run".
The unique identifier for this extract run.
Example: "exr_Xj8mK2pL9nR4vT7qY5wZ"
The status of a processor run (extract, classify, or split):
"PENDING"- The run has been created and is waiting to be processed"PROCESSING"- The run is in progress"PROCESSED"- The run completed successfully"FAILED"- The run failed"CANCELLED"- The run was cancelled
The final output, either reviewed or initial. This is a union of two possible shapes:
- JSON Schema output: The current output format, returned for runs created with a JSON Schema config.
- Legacy output: A legacy output format from a previous API version. This shape is only returned for runs that were originally created with a legacy config.
Availability: Present when status is "PROCESSED".
The initial output from the extract run, before any review edits.
Availability: Present when reviewed is true.
The output after human review.
Availability: Present when reviewed is true.
The reason for failure.
Availability: Present when status is "FAILED".
Possible values include:
ABORTED- The run was aborted by the userINTERNAL_ERROR- An unexpected internal error occurredFAILED_TO_PROCESS_FILE- Failed to process the file (e.g., OCR failure, file access issues)INVALID_PROCESSOR- The processor configuration is invalidINVALID_CONFIGURATION- The provided configuration is incompatible with the selected modelPARSING_ERROR- Failed to parse the extraction outputPRE_PROCESSING_FAILURE- An error occurred during preprocessing (e.g., chunking)POST_PROCESSING_FAILURE- An error occurred during postprocessingOUT_OF_CREDITS- Insufficient credits to run the extractionSCHEMA_GENERATION_FAILED- Automatic schema inference failed (only applies whenschemais omitted). The file could not be parsed or a schema could not be generated from it.
Note: Additional failure reasons may be added in the future. Your integration should handle unknown values gracefully.
A detailed message about the failure.
Availability: Present when status is "FAILED".
Any metadata that was provided when creating the extract run.
Availability: Present when metadata was provided during creation.
Details of edits made during review.
Availability: Present when edited is true.
The configuration used for this extract run. This is a union of two possible shapes:
- JSON Schema config: The current config format. All runs created through this API version use this shape.
- Legacy config: A fields-array config from a previous API version. This shape is only returned when retrieving runs that were originally created with the legacy format. This API version does not support creating runs with legacy configs.
The extractor that was used for this run.
Availability: Present when an extractor reference was provided. Not present when using inline config.
The version of the extractor that was used for this run.
Availability: Present when an extractor reference was provided. Not present when using inline config.
The file that was processed. null when the file could not be accessed or processed (for example a run that failed during file ingestion, or a multi-file batch run).
The ID of the parse run that was used for this extract run.
Availability: Present when a parse run was created.
Usage credits consumed by this extract run.
Availability: Present when status is "PROCESSED". Will not be returned for runs created before October 7, 2025 or for customers on legacy billing systems.
The time (in UTC) at which the object was created. Will follow the RFC 3339 format.
Example: "2024-03-21T16:45:00Z"
The time (in UTC) at which the object was last updated. Will follow the RFC 3339 format.
Example: "2024-03-21T16:45:00Z"

