For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Book a demoLog in
ProductAPI ReferenceChangelogModel Versioning
ProductAPI ReferenceChangelogModel Versioning
    • Authentication
    • API Versioning
  • Workflow Endpoints
    • Run Workflow
    • Workflow Run
    • List Workflow Runs
    • Update Workflow Run
    • Create Workflow
    • Batch Run Workflow
    • Correct Workflow Run (Deprecated)
  • Processor Endpoints
    • Run Processor
    • Get Processor Run
    • Batch Processor Run
    • Create Processor
    • Update Processor
    • Publish Processor Version
    • Processor Version
    • List Processor Versions
  • Parse Endpoints
    • Parse File
  • File Endpoints
    • Upload File
    • Get File
    • List Files
    • Create File (Deprecated)
  • Evaluation Set Endpoints
    • Create Evaluation Set
    • Create Evaluation Set Item
    • Update Evaluation Set Item
    • Bulk Create Evaluation Set Item
  • Objects
    • Block
    • Evaluation Set
    • Evaluation Set Item
    • File
    • Processor
    • Processor Run
    • Batch Processor Run
    • Processor Version
    • Workflow
    • Workflow Run
    • Workflow Run Summary
  • Guides
    • Processor Configs
    • Output Types
    • Bounding Boxes
    • Supported File Types
    • Rate Limits
    • User Roles and Permissions
  • Webhooks
    • Configuration
    • Events
LogoLogo
Book a demoLog in
Objects

The File object

Example File
1{
2 "object": "file",
3 "id": "file_1234",
4 "name": "example_file",
5 "type": "PDF",
6 "presignedUrl": "https://s3.example.com/file_1234.pdf",
7 "parentFileId": "file_5678", // Optional, only set if this file is a derivative of another file
8 "contents": {
9 "rawText": "This is the raw text content of the file...",
10 "pages": [
11 {
12 "pageNumber": 1,
13 "markdown": "This is the markdown content of the page...",
14 }
15 ]
16 },
17 "metadata": {
18 "parentSplit": { // Optional, only set if this file is a derivative of another file
19 "id": "324kjlfsd",
20 "type": "addendum",
21 "identifier": "addendum_1",
22 "startPage": 7,
23 "endPage": 9
24 }
25 },
26 "createdAt": "2024-01-01T00:00:00Z",
27 "updatedAt": "2024-01-01T00:00:00Z"
28}
Was this page helpful?
Previous

The Processor object

Next
Built with

The File object represents a file in Extend. Files are created for each workflow run, and can also be created directly via API for use in evaluation sets.

properties
object
string

The type of the object, in this case it will always be “file”.

id
string

The file ID.

name
string

The name of the file.

type
string

The Extend normalized type of the file. One of IMG PDF TXT DOCX CSV EXCEL.

parentFileId
string

The ID of the parent file. Only included if this file is a derivative of another file, for instance if it was created via a Splitter in a workflow.

presignedUrl
string

A presigned URL to download the file. Expires after 15 minutes.

contents
object
properties
rawText
string

The raw text content of the file. This is included for all file types if the rawText query parameter is set to true in the endpoint request.

pages
array

An array of page objects representing the content of each page in the file.

properties
pageNumber
number

The page number of this page in the document.

markdown
string

Cleaned and structured markdown content of the page. Available for PDF and IMG file types. Only included if the markdown query parameter is set to true in the endpoint request.

html
string

Cleaned and structured html content of the page. Available for DOCX file types (that were not auto-converted to PDFs). Only included if the html query parameter is set to true in the endpoint request.

metadata
object
properties
pageCount
number

The number of pages in the file. This is only set for PDF/DOCX files.

parentSplit
object

The split metadata details. Only included if this file is a derivative of another file, for instance if it was created via a Splitter in a workflow.

properties
id
string

The ID of the split.

type
string

The type of the split.

identifier
string

The identifier of the split.

startPage
number

The start page of the split.

endPage
number

The end page of the split.

createdAt
string

The date and time the file was created.

updatedAt
string

The date and time the file was last updated.

Note: There are several deprecated fields that are still in the payload for backwards compatibility. These are:

  • markdown/rawText in IMGs not nested under pages array. These will still be included in payloads until full deprecation in December 2024.