The File object
The File object represents a file in Extend. Files are created for each workflow run, and can also be created directly via API for use in evaluation sets.
properties
The type of the object, in this case it will always be “file”.
The file ID.
The name of the file.
The Extend normalized type of the file. One of IMG
PDF
TXT
DOCX
CSV
EXCEL
.
The ID of the parent file. Only included if this file is a derivative of another file, for instance if it was created via a Splitter in a workflow.
A presigned URL to download the file. Expires after 15 minutes.
properties
The raw text content of the file. This is included for all file types if
the rawText
query parameter is set to true in the endpoint request.
An array of page objects representing the content of each page in the file.
properties
The page number of this page in the document.
Cleaned and structured markdown content of the page. Available for PDF
and IMG file types. Only included if the markdown
query parameter is
set to true in the endpoint request.
Cleaned and structured html content of the page. Available for DOCX
file types (that were not auto-converted to PDFs). Only included if
the html
query parameter is set to true in the endpoint request.
properties
The number of pages in the file. This is only set for PDF/DOCX files.
The split metadata details. Only included if this file is a derivative of another file, for instance if it was created via a Splitter in a workflow.
properties
The ID of the split.
The type of the split.
The identifier of the split.
The start page of the split.
The end page of the split.
The date and time the file was created.
The date and time the file was last updated.
Note: There are several deprecated fields that are still in the payload for backwards compatibility. These are:
- markdown/rawText in IMGs not nested under pages array. These will still be included in payloads until full deprecation in December 2024.