Configuring Workflows

A workflow’s behavior is defined by its steps array. You set it when you create a workflow, update its draft, or create a version. This page is the complete reference for every step type and routing rule. For the end-to-end create → deploy → run lifecycle, start with Create a Workflow.

Each step has a name, a type, an optional config, and a next array that defines where documents flow after the step completes. The same shape is used in request bodies and in the workflow version responses returned by the API.

Quick Start

The simplest useful workflow extracts structured data from a document:

trigger → parse → extract → review

1 {
2   "name": "Invoice Processing",
3   "steps": [
4     {
5       "name": "trigger",
6       "type": "TRIGGER",
7       "next": [{ "step": "parse" }]
8     },
9     {
10       "name": "parse",
11       "type": "PARSE",
12       "next": [{ "step": "extract" }]
13     },
14     {
15       "name": "extract",
16       "type": "EXTRACT",
17       "config": {
18         "extractor": { "id": "ex_abc123", "version": "latest" }
19       },
20       "next": [{ "step": "review" }]
21     },
22     {
23       "name": "review",
24       "type": "HUMAN_REVIEW"
25     }
26   ]
27 }

Every workflow starts with a TRIGGER step followed by a PARSE step. After parsing, you can chain any combination of processing, branching, and validation steps.

Key Concepts

Routing

Each step’s next array defines where documents flow. For most step types, you only need to specify the target step:

1 "next": [{ "step": "extract" }]

For branching step types, each next entry includes a routing field specific to the step type:

1 // CLASSIFY or SPLIT — use classificationId
2 "next": [
3   { "step": "extract_invoice", "classificationId": "cls_invoice" },
4   { "step": "extract_receipt", "classificationId": "cls_receipt" }
5 ]
6 
7 // CONDITIONAL — use conditionId
8 "next": [
9   { "step": "review", "conditionId": "high_value" },
10   { "step": "webhook", "conditionId": "default_path" }
11 ]
12 
13 // RULE_VALIDATION — use result
14 "next": [
15   { "step": "webhook", "result": "pass" },
16   { "step": "review", "result": "fail" }
17 ]

Saved processors vs. inline configs

EXTRACT, CLASSIFY, and SPLIT steps specify their processor in one of two ways — provide exactly one per step:

Saved reference — extractor / classifier / splitter: points at a saved processor by ID and version.
```
1 "config": { "extractor": { "id": "ex_abc123", "version": "latest" } }
```

Inline config — extractorConfig / classifierConfig / splitterConfig: embeds the full processor configuration directly in the step, using the same config shape as the standalone run endpoints (e.g. Create Extract Run). No saved processor is needed.

1 "config": {
2   "extractorConfig": {
3     "schema": {
4       "type": "object",
5       "properties": {
6         "invoice_number": { "type": "string" },
7         "total": { "type": "number" }
8       }
9     },
10     "extractionRules": "Prefer the remit-to address."
11   }
12 }

Inline configs make a workflow definition self-contained and portable: it carries no workspace-specific processor IDs, so the same file validates and deploys against any workspace. This is especially useful for GitHub-managed workflow files and for provisioning workflows programmatically across environments.

A few things to know about inline configs:

Inline extractor configs require schema — schema-less extraction (schema inferred from the file at run time) is a run-endpoint feature and is not supported in workflows.
Inline configs have no version; the config embedded in the deployed workflow version is exactly what runs. Responses return the inline config verbatim.
CONDITIONAL_EXTRACT rules only support saved references.

You can mix the two styles freely within one workflow — for example, an inline classifier routing to extract steps that reference saved extractors.

Workflow Patterns

Linear Extraction

The simplest pattern — every step has exactly one downstream step.

trigger → parse → extract → webhook

1 [
2   { "name": "trigger", "type": "TRIGGER", "next": [{ "step": "parse" }] },
3   { "name": "parse", "type": "PARSE", "next": [{ "step": "extract" }] },
4   {
5     "name": "extract",
6     "type": "EXTRACT",
7     "config": { "extractor": { "id": "ex_abc123", "version": "latest" } },
8     "next": [{ "step": "webhook" }]
9   },
10   { "name": "webhook", "type": "WEBHOOK_RESPONSE" }
11 ]

Classify and Route

Use a CLASSIFY step to route documents to different extractors based on document type. Each next entry’s classificationId must match a classification ID from the classifier’s config.

                         ┌─ cls_invoice ─→ extract_invoice ─┐
trigger → parse → classify─ cls_receipt ─→ extract_receipt ─┼→ review → webhook
                         └─ cls_other ──→───────────────────┘

First, your classifier defines classifications with stable IDs:

1 const classifierConfig = {
2   classifications: [
3     { id: "cls_invoice", type: "invoice", description: "Invoice documents" },
4     { id: "cls_receipt", type: "receipt", description: "Receipt documents" },
5     { id: "cls_other", type: "other", description: "Other documents" },
6   ],
7 };

Then the workflow step uses those IDs as classificationId values:

1 [
2   { "name": "trigger", "type": "TRIGGER", "next": [{ "step": "parse" }] },
3   { "name": "parse", "type": "PARSE", "next": [{ "step": "classify" }] },
4   {
5     "name": "classify",
6     "type": "CLASSIFY",
7     "config": {
8       "classifier": { "id": "cl_abc123", "version": "0.1" }
9     },
10     "next": [
11       { "step": "extract_invoice", "classificationId": "cls_invoice" },
12       { "step": "extract_receipt", "classificationId": "cls_receipt" },
13       { "step": "review", "classificationId": "cls_other" }
14     ]
15   },
16   {
17     "name": "extract_invoice",
18     "type": "EXTRACT",
19     "config": { "extractor": { "id": "ex_invoice456", "version": "1.0" } },
20     "next": [{ "step": "review" }]
21   },
22   {
23     "name": "extract_receipt",
24     "type": "EXTRACT",
25     "config": { "extractor": { "id": "ex_receipt789", "version": "1.0" } },
26     "next": [{ "step": "review" }]
27   },
28   { "name": "review", "type": "HUMAN_REVIEW", "next": [{ "step": "webhook" }] },
29   { "name": "webhook", "type": "WEBHOOK_RESPONSE" }
30 ]

Conditions use classification IDs (e.g. "cls_invoice"), not type strings (e.g. "invoice"). IDs are stable across renames — if you rename a classification type from "invoice" to "billing_invoice", the ID stays the same and routing continues to work.

Split and Route

Use a SPLIT step to break a multi-document file into individual sub-documents and route each one by type. The same ID-based routing rules apply as for CLASSIFY.

                      ┌─ cls_invoice ─→ extract_invoice ─┐
trigger → parse → split─ cls_receipt ─→ extract_receipt ─┼→ collect → webhook
                      └─ cls_other ──→ review ───────────┘

1 [
2   { "name": "trigger", "type": "TRIGGER", "next": [{ "step": "parse" }] },
3   { "name": "parse", "type": "PARSE", "next": [{ "step": "split" }] },
4   {
5     "name": "split",
6     "type": "SPLIT",
7     "config": {
8       "splitter": { "id": "spl_abc123", "version": "0.1" }
9     },
10     "next": [
11       { "step": "extract_invoice", "classificationId": "cls_invoice" },
12       { "step": "extract_receipt", "classificationId": "cls_receipt" },
13       { "step": "review", "classificationId": "cls_other" }
14     ]
15   },
16   {
17     "name": "extract_invoice",
18     "type": "EXTRACT",
19     "config": { "extractor": { "id": "ex_invoice456", "version": "1.0" } },
20     "next": [{ "step": "collect" }]
21   },
22   {
23     "name": "extract_receipt",
24     "type": "EXTRACT",
25     "config": { "extractor": { "id": "ex_receipt789", "version": "1.0" } },
26     "next": [{ "step": "collect" }]
27   },
28   { "name": "review", "type": "HUMAN_REVIEW", "next": [{ "step": "collect" }] },
29   { "name": "collect", "type": "COLLECT", "next": [{ "step": "webhook" }] },
30   { "name": "webhook", "type": "WEBHOOK_RESPONSE" }
31 ]

Conditional Logic

Use a CONDITIONAL step to route based on extracted data values. Each condition has an id that is referenced by next[].conditionId.

trigger → parse → extract → route_total ─┬─ high_value ──→ review → webhook
                                         └─ default_path → webhook

1 [
2   { "name": "trigger", "type": "TRIGGER", "next": [{ "step": "parse" }] },
3   { "name": "parse", "type": "PARSE", "next": [{ "step": "extract" }] },
4   {
5     "name": "extract",
6     "type": "EXTRACT",
7     "config": { "extractor": { "id": "ex_abc123", "version": "latest" } },
8     "next": [{ "step": "route_total" }]
9   },
10   {
11     "name": "route_total",
12     "type": "CONDITIONAL",
13     "config": {
14       "conditions": [
15         {
16           "id": "high_value",
17           "type": "IF",
18           "operation": "GTE",
19           "leftOperand": "{{ extract.output.value.total }}",
20           "rightOperand": "10000"
21         },
22         { "id": "default_path", "type": "ELSE" }
23       ]
24     },
25     "next": [
26       { "step": "review", "conditionId": "high_value" },
27       { "step": "webhook", "conditionId": "default_path" }
28     ]
29   },
30   { "name": "review", "type": "HUMAN_REVIEW", "next": [{ "step": "webhook" }] },
31   { "name": "webhook", "type": "WEBHOOK_RESPONSE" }
32 ]

Reference an upstream step’s output in leftOperand (and rightOperand, when comparing to another value) using the {{ stepName.output.value.field }} template syntax — stepName is the producing step’s name, .output.value is the extractor’s value payload, and .field is the field to compare. operation accepts EQUALS, GTE, LTE, IS_NULL, CONTAINS, or NO_OP. See Conditional Steps for the full reference syntax.

Validation

Use RULE_VALIDATION to check extracted data against business rules and branch on the result.

trigger → parse → extract → validate ─┬─ pass → webhook
                                      └─ fail → review → webhook

1 [
2   { "name": "trigger", "type": "TRIGGER", "next": [{ "step": "parse" }] },
3   { "name": "parse", "type": "PARSE", "next": [{ "step": "extract" }] },
4   {
5     "name": "extract",
6     "type": "EXTRACT",
7     "config": { "extractor": { "id": "ex_abc123", "version": "latest" } },
8     "next": [{ "step": "validate" }]
9   },
10   {
11     "name": "validate",
12     "type": "RULE_VALIDATION",
13     "config": {
14       "rules": [
15         {
16           "name": "total_matches_sum",
17           "formula": "extraction1.total = extraction1.subtotal + extraction1.tax",
18           "description": "Checks invoice math"
19         }
20       ]
21     },
22     "next": [
23       { "step": "webhook", "result": "pass" },
24       { "step": "review", "result": "fail" }
25     ]
26   },
27   { "name": "review", "type": "HUMAN_REVIEW", "next": [{ "step": "webhook" }] },
28   { "name": "webhook", "type": "WEBHOOK_RESPONSE" }
29 ]

See Formulas for the rule expression language and Validation Step for the UI guide.

Step Reference

All saved processor references (extractor, classifier, splitter) require an explicit version field. Inline configs (extractorConfig, classifierConfig, splitterConfig) have no version — see Saved processors vs. inline configs.

CLASSIFY and SPLIT steps do not support "latest" — you must pin to a specific semver version (e.g. "0.1") or "draft". This is because classification IDs used for routing are tied to a specific version’s config. If a new version is published with different classifications, routing would break silently.

Step type	`"latest"`	`"draft"`	Semver (e.g. `"1.0"`)	Inline config
`EXTRACT`	Yes	Yes	Yes	Yes
`CONDITIONAL_EXTRACT`	Yes	Yes	Yes	No
`CLASSIFY`	No	Yes	Yes	Yes
`SPLIT`	No	Yes	Yes	Yes

Trigger

The single entry point for every workflow. Must route to exactly one PARSE step.

1 { "name": "trigger", "type": "TRIGGER", "next": [{ "step": "parse" }] }

Parse

Converts the uploaded file into structured content (OCR, text extraction). Must appear immediately after the trigger.

Optionally configure parsing behavior with parseConfig. See Parse Configuration Options.

1 {
2   "name": "parse",
3   "type": "PARSE",
4   "config": {
5     "parseConfig": {
6       "target": "markdown",
7       "chunkingStrategy": { "type": "page" }
8     }
9   },
10   "next": [{ "step": "extract" }]
11 }

Extract

Extracts structured data from parsed content. Specify the extractor with a saved reference (extractor — version required: "latest", "draft", or semver) or an inline config (extractorConfig). Can be created without config — next cannot be set until config is provided, and config is required before deploy.

1 {
2   "name": "extract",
3   "type": "EXTRACT",
4   "config": {
5     "extractor": { "id": "ex_abc123", "version": "latest" }
6   },
7   "next": [{ "step": "review" }]
8 }

Or with an inline config — note that schema is required for inline extractor configs:

1 {
2   "name": "extract",
3   "type": "EXTRACT",
4   "config": {
5     "extractorConfig": {
6       "schema": {
7         "type": "object",
8         "properties": {
9           "invoice_number": { "type": "string" },
10           "total": { "type": "number" }
11         }
12       }
13     }
14   },
15   "next": [{ "step": "review" }]
16 }

Classify

Routes documents to different downstream steps based on classification. Conditions must reference classification IDs, not type strings. Specify the classifier with a saved reference (classifier — requires a pinned version; "latest" is not allowed) or an inline config (classifierConfig). Can be created without config — next cannot be set until config is provided, and config is required before deploy.

See the Classify and Route pattern above for a complete example with a saved reference. With an inline config, the next[].classificationId values must match the id values in the inline classifications array:

1 {
2   "name": "classify",
3   "type": "CLASSIFY",
4   "config": {
5     "classifierConfig": {
6       "classifications": [
7         { "id": "cls_invoice", "type": "invoice", "description": "Invoice documents" },
8         { "id": "cls_other", "type": "other", "description": "Anything else" }
9       ]
10     }
11   },
12   "next": [
13     { "step": "extract_invoice", "classificationId": "cls_invoice" },
14     { "step": "review", "classificationId": "cls_other" }
15   ]
16 }

Split

Splits a multi-document file into sub-documents and routes each one. Same ID-based routing rules as CLASSIFY. Specify the splitter with a saved reference (splitter — requires a pinned version; "latest" is not allowed) or an inline config (splitterConfig, with routing IDs coming from its splitClassifications array). Can be created without config — next cannot be set until config is provided, and config is required before deploy.

See the Split and Route pattern above for a complete example.

Merge Extract

Combines outputs from multiple upstream extract steps. Use mergeOrder to control how overlapping fields are prioritized.

1 {
2   "name": "merge",
3   "type": "MERGE_EXTRACT",
4   "config": { "mergeOrder": "confidence" },
5   "next": [{ "step": "webhook" }]
6 }

Conditional

Routes based on extracted data values using if/else logic. See the Conditional Logic pattern above.

For the UI-based version of this step, see Conditional Steps.

Conditional Extract

Chooses which extractor to run based on formula conditions. Each rule pairs a formula with an extractor reference — rules only support saved references, not inline configs. The last rule must have formula: "TRUE" as a default catch-all to prevent runtime failures when no other rule matches. Can be created without config — next cannot be set until config is provided, and config is required before deploy.

1 {
2   "name": "route_extractor",
3   "type": "CONDITIONAL_EXTRACT",
4   "config": {
5     "rules": [
6       {
7         "name": "cigna_provider",
8         "formula": "metadata.provider_name = \"cigna\"",
9         "extractor": { "id": "ex_cigna", "version": "latest" }
10       },
11       {
12         "name": "fallback",
13         "formula": "TRUE",
14         "extractor": { "id": "ex_generic", "version": "latest" }
15       }
16     ]
17   },
18   "next": [{ "step": "validate" }]
19 }

See Formulas for the expression language and Conditional Extraction Step for the UI guide.

Rule Validation

Checks extracted data against boolean rules. Can be created without config — next cannot be set until config is provided, and config is required before deploy. See the Validation pattern above for a complete example.

External Data Validation

Sends extraction data to an external HTTP endpoint for validation. Can be created without config — next cannot be set until config is provided, and config is required before deploy.

1 {
2   "name": "external_validate",
3   "type": "EXTERNAL_DATA_VALIDATION",
4   "config": {
5     "requestOptions": {
6       "url": "https://api.example.com/validate",
7       "method": "POST",
8       "headers": { "x-api-key": "secret" },
9       "contentType": "application/json"
10     },
11     "failureBehavior": "EXIT"
12   },
13   "next": [{ "step": "review" }]
14 }

See External Data Validation Step for more context.

Human Review

Pauses the workflow for manual review in the dashboard before continuing to downstream steps.

1 { "name": "review", "type": "HUMAN_REVIEW", "next": [{ "step": "webhook" }] }

Collect

Joins multiple upstream branches before continuing. Use after CLASSIFY or SPLIT branches to wait for all parallel work to complete.

1 { "name": "collect", "type": "COLLECT", "next": [{ "step": "webhook" }] }

File Conversion

Converts the file format before downstream processing. Use failureBehavior to control whether conversion failures stop the workflow.

1 {
2   "name": "convert",
3   "type": "FILE_CONVERSION",
4   "config": { "failureBehavior": "CONTINUE" },
5   "next": [{ "step": "parse" }]
6 }

Webhook Response

Terminal step that delivers results to your webhook endpoint. Must not have next.

1 { "name": "webhook", "type": "WEBHOOK_RESPONSE" }

Next steps

Create a Workflow

The end-to-end create → deploy → run lifecycle.

Workflow Versioning

Deploy, pin, and promote workflow versions.

Reviewing Workflow Runs

Handle steps that pause for human review.

Create Workflow Version API

Full request and response schema.