Parse File
Parse files to get cleaned, chunked target content (e.g. markdown).
The Parse endpoint allows you to convert documents into structured, machine-readable formats with fine-grained control over the parsing process. This endpoint is ideal for extracting cleaned document content to be used as context for downstream processing, e.g. RAG pipelines, custom ingestion pipelines, embeddings classification, etc.
Unlike processor and workflow runs, parsing is a synchronous endpoint and returns the parsed content in the response. Expected latency depends primarily on file size. This makes it suitable for workflows where you need immediate access to document content without waiting for asynchronous processing.
For more details, see the Parse File guide.
Headers
Bearer authentication of the form Bearer <token>, where token is your auth token.
API version to use for the request. If you do not specify a version, you will either receive a 400 Bad Request
or be set to a previous legacy version. See API Versioning for more details.
Request
A file object containing either a URL or a fileId.
Configuration options for the parsing process.
Response
Successfully parsed file
The type of object. Will always be "parser_run"
.
A unique identifier for the parser run. Will always start with "parser_run_"
Example: "parser_run_xK9mLPqRtN3vS8wF5hB2cQ"
The identifier of the file that was parsed. This can be used as a parameter to other Extend endpoints, such as processor runs. This allows downstream processing to reuse a cache of the parsed file content to reduce your usage costs.
An array of chunks that were parsed from the file.
The status of the parser run:
"PROCESSED"
- The file was successfully processed"FAILED"
- The processing failed (see failureReason for details)
Metrics about the parsing process.
The configuration used for the parsing process, including any default values that were applied.
The reason for failure if status is “FAILED”.