Parse File

1 curl -X POST https://api.extend.ai/parse \ 2 -H "x-extend-api-version: 2025-04-21" \ 3 -H "Authorization: Bearer <token>" \ 4 -H "Content-Type: application/json" \ 5 -d '{ 6 "file": {} 7 }'

1 { 2 "object": "parser_run", 3 "id": "parser_run_xK9mLPqRtN3vS8wF5hB2cQ", 4 "fileId": "file_Zk9mNP12Qw4yTv8BdR3H", 5 "chunks": [ 6 { 7 "object": "chunk", 8 "type": "page", 9 "content": "This is the content of the chunk.", 10 "metadata": { 11 "pageRange": { 12 "start": 1, 13 "end": 1 14 } 15 }, 16 "blocks": [ 17 { 18 "object": "block", 19 "id": "foo", 20 "type": "text", 21 "content": "foo", 22 "details": {}, 23 "metadata": { 24 "page": { 25 "number": 42, 26 "width": 42, 27 "height": 42 28 } 29 }, 30 "polygon": [ 31 { 32 "x": 10, 33 "y": 20 34 } 35 ], 36 "boundingBox": { 37 "left": 10, 38 "right": 20, 39 "top": 10, 40 "bottom": 20 41 } 42 } 43 ] 44 } 45 ], 46 "status": "PROCESSED", 47 "metrics": { 48 "processingTimeMs": 42, 49 "pageCount": 42 50 }, 51 "config": { 52 "target": "markdown", 53 "chunkingStrategy": { 54 "type": "page", 55 "minCharacters": 100, 56 "maxCharacters": 1000 57 }, 58 "blockOptions": { 59 "figures": { 60 "enabled": true, 61 "figureImageClippingEnabled": true 62 }, 63 "tables": { 64 "enabled": true, 65 "targetFormat": "markdown" 66 }, 67 "text": { 68 "signatureDetectionEnabled": true 69 } 70 }, 71 "advancedOptions": { 72 "pageRotationEnabled": true 73 } 74 }, 75 "failureReason": "foo" 76 }

Parse files to get cleaned, chunked target content (e.g. markdown).

The Parse endpoint allows you to convert documents into structured, machine-readable formats with fine-grained control over the parsing process. This endpoint is ideal for extracting cleaned document content to be used as context for downstream processing, e.g. RAG pipelines, custom ingestion pipelines, embeddings classification, etc.

Unlike processor and workflow runs, parsing is a synchronous endpoint and returns the parsed content in the response. Expected latency depends primarily on file size. This makes it suitable for workflows where you need immediate access to document content without waiting for asynchronous processing.

For more details, see the Parse File guide.

Query parameters

responseTypeenumOptional

Controls the format of the response chunks. Defaults to json if not specified.

json - Returns parsed outputs in the response body
url - Return a presigned URL to the parsed content in the response body

Allowed values:

Request

This endpoint expects an object.

fileobjectRequired

A file object containing either a URL or a fileId.

configobjectOptional

Configuration options for the parsing process.

Response

Successfully parsed file

objectenum

The type of object. Will always be "parser_run".

Allowed values:

idstring

A unique identifier for the parser run. Will always start with "parser_run_"

Example: "parser_run_xK9mLPqRtN3vS8wF5hB2cQ"

fileIdstring

The identifier of the file that was parsed. This can be used as a parameter to other Extend endpoints, such as processor runs.

chunkslist of objects

An array of chunks that were parsed from the file.

statusenum

The status of the parser run:

"PROCESSED" - The file was successfully processed
"FAILED" - The processing failed (see failureReason for details)

Allowed values:

metricsobject

Metrics about the parsing process.

configobject

The configuration used for the parsing process, including any default values that were applied.

failureReasonstring or null

The reason for failure if status is "FAILED".

Headers

Query parameters

Request

Response

Errors