The Split API accepts a config object that controls how a document is divided into sub-documents. Only splitClassifications is required — it defines the document types the splitter can assign. Everything else is optional: use splitRules to guide where the splitter divides the document, baseProcessor and baseVersion to pick the model, advancedOptions for Excel handling, page overlap, and page ranges, and parseConfig to tune how the document is parsed before splitting.
You can pass config inline on a one-off /split call, or save it to a reusable Splitter and reference that splitter by id. Either way the configuration is identical.
For default values and the full schema, see the Create Split Run API reference.
Prefer a UI? Extend Studio lets you configure the splitter visually and export the config JSON.
splitClassificationsType: array of classification objects (required)
The document types the splitter can assign to each sub-document. You must provide at least one classification, and at least one classification must have the type "other" as a catch-all. Each entry must have a unique id.
splitClassifications[].identifierKeyType: string
A natural-language rule describing how to extract a unique identifier for sub-documents of this type, for example "Extract the invoice number from the document header". When provided, the splitter extracts the identifier for each sub-document of that type and returns it in the split’s identifier field. It also uses the value to decide when adjacent pages belong to the same document versus a new one.
Supported on splitting_light >= 1.3.0 and splitting_performance >= 1.5.0. On older versions it is accepted but ignored. It replaces the deprecated advancedOptions.splitIdentifierRules field, which set a single global rule.
splitRulesType: string
Custom rules, in natural language, that guide how the splitter divides the document — for example “Keep all pages of a signed contract together” or “Treat each new account number as a new statement.”
baseProcessorType: "splitting_performance" | "splitting_light" (default: splitting_performance)
The splitting model to use.
baseVersionType: string
The version of the selected processor to use. If not provided, the latest stable version for the selected baseProcessor is used automatically. See the Splitting Performance versions.
advancedOptions.splitExcelDocumentsBySheetEnabledType: boolean (default: false)
For Excel documents, split by worksheet.
advancedOptions.pageOverlapEnabledType: boolean (default: false)
When enabled, the splitter allows page overlap so a page can occur in two adjacent splits when it carries context for both the previous and the next document.
Supported on splitting_light >= 1.1.0 and splitting_performance >= 1.2.0. On older versions it is accepted but ignored.
advancedOptions.pageRangesType: Array<{ start: number, end: number }>
Restrict splitting to specific page ranges. Page numbers are 1-based and inclusive.
parseConfigType: object
Because the document is parsed before it is split, you can tune that step with parseConfig (for example, the target format and chunking). See Parse Configuration for the full set of options.
To reuse a configuration across runs and workflows, create a Splitter and reference it by id instead of inlining config each time. You can override specific fields per run with overrideConfig.
A splitter is a kind of processor — see that page for how saving a configuration lets you version, evaluate, and optimize it.
splitter.overrideConfig.