Run Processor
Run processors (extraction, classification, splitting, etc.) on a given document.
In general, the recommended way to integrate with Extend in production is via workflows, using the Run Workflow endpoint. This is due to several factors:
- file parsing/pre-processing will automatically be reused across multiple processors, which will give you simplicity and cost savings given that many use cases will require multiple processors to be run on the same document.
- workflows provide dedicated human in the loop document review, when needed.
- workflows allow you to model and manage your pipeline with a single endpoint and corresponding UI for modeling and monitoring.
However, there are a number of legitimate use cases and systems where it might be easier to model the pipeline via code and run processors directly. This endpoint is provided for this purpose.
Similar to workflow runs, processor runs are asynchronous and will return a status of PROCESSING
until the run is complete. You can configure webhooks to receive notifications when a processor run is complete or failed.
Headers
Bearer authentication of the form Bearer <token>, where token is your auth token.
API version to use for the request. If you do not specify a version, you will either receive a 400 Bad Request
or be set to a previous legacy version. See API Versioning for more details.
Request
The ID of the processor to be run. The id will start with "dp_"
. This ID can be found when viewing a processor on the Extend platform.
Example: "dp_Xj8mK2pL9nR4vT7qY5wZ"
An optional version of the processor to use. When not supplied, the most recent published version of the processor will be used. Special values include:
"latest"
for the most recent published version. If there are no published versions, the draft version will be used."draft"
for the draft version.- Specific version numbers corresponding to versions your team has published, e.g.
"1.0"
,"2.2"
, etc.
The file to be processed. One of file
or rawText
must be provided. Supported file types can be found here.
A raw string to be processed. Can be used in place of file when passing raw text data streams. One of file
or rawText
must be provided.
An optional value used to determine the relative order of ProcessorRuns when rate limiting is in effect. Lower values will be prioritized before higher values.
An optional object that can be passed in to identify the run of the document processor. It will be returned back to you in the response and webhooks.
The configuration for the processor run. If this is provided, this config will be used. If not provided, the config for the specific version you provide will be used. The type of configuration must match the processor type.
Response
Successfully created processor run