For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Book a demoLog in
DocumentationAPI ReferenceModel VersioningChangelog
DocumentationAPI ReferenceModel VersioningChangelog
    • Studio
    • Support
    • Benchmarks
    • Status
  • Getting Started
    • Overview
    • API Quickstart
    • Dashboard Quickstart
    • Agent Quickstart
  • Dev Tools
    • SDKs
    • CLI
  • Capabilities
      • Overview
      • Configuration
      • Response Format
LogoLogo
Book a demoLog in
On this page
  • Split classifications
  • splitClassifications
  • splitClassifications[].identifierKey
  • Split rules
  • splitRules
  • Base processor
  • baseProcessor
  • baseVersion
  • Advanced options
  • advancedOptions.splitExcelDocumentsBySheetEnabled
  • advancedOptions.pageOverlapEnabled
  • advancedOptions.pageRanges
  • Parse config
  • parseConfig
  • Using a saved splitter
CapabilitiesSplitting

Configuration

Was this page helpful?
Previous

Response Format

Next
Built with

The Split API accepts a config object that controls how a document is divided into sub-documents. Only splitClassifications is required — it defines the document types the splitter can assign. Everything else is optional: use splitRules to guide where the splitter divides the document, baseProcessor and baseVersion to pick the model, advancedOptions for Excel handling, page overlap, and page ranges, and parseConfig to tune how the document is parsed before splitting.

You can pass config inline on a one-off /split call, or save it to a reusable Splitter and reference that splitter by id. Either way the configuration is identical.

For default values and the full schema, see the Create Split Run API reference.

Prefer a UI? Extend Studio lets you configure the splitter visually and export the config JSON.


Split classifications

splitClassifications

Type: array of classification objects (required)

The document types the splitter can assign to each sub-document. You must provide at least one classification, and at least one classification must have the type "other" as a catch-all. Each entry must have a unique id.

FieldTypeRequiredDescription
idstringYesUnique identifier for the classification. Lowercase, underscore-separated is recommended.
typestringYesType identifier for the classification, returned on each split as type.
descriptionstringYesA detailed description of the document type. This is your biggest lever on accuracy.
identifierKeystringNoA natural-language rule for extracting a per-type identifier (see below).
1{
2 "config": {
3 "splitClassifications": [
4 {
5 "id": "loan_application",
6 "type": "loan_application",
7 "description": "A Uniform Residential Loan Application (Form 1003)."
8 },
9 {
10 "id": "other",
11 "type": "other",
12 "description": "Any other document type."
13 }
14 ]
15 }
16}

splitClassifications[].identifierKey

Type: string

A natural-language rule describing how to extract a unique identifier for sub-documents of this type, for example "Extract the invoice number from the document header". When provided, the splitter extracts the identifier for each sub-document of that type and returns it in the split’s identifier field. It also uses the value to decide when adjacent pages belong to the same document versus a new one.

Supported on splitting_light >= 1.3.0 and splitting_performance >= 1.5.0. On older versions it is accepted but ignored. It replaces the deprecated advancedOptions.splitIdentifierRules field, which set a single global rule.

1{
2 "config": {
3 "splitClassifications": [
4 {
5 "id": "invoice",
6 "type": "invoice",
7 "description": "An invoice or bill for goods or services.",
8 "identifierKey": "Extract the invoice number from the document header"
9 }
10 ]
11 }
12}

Split rules

splitRules

Type: string

Custom rules, in natural language, that guide how the splitter divides the document — for example “Keep all pages of a signed contract together” or “Treat each new account number as a new statement.”

1{
2 "config": {
3 "splitRules": "Keep all pages of a signed contract together in a single split."
4 }
5}

Base processor

baseProcessor

Type: "splitting_performance" | "splitting_light" (default: splitting_performance)

The splitting model to use.

ProcessorWhen to use
splitting_performanceHighest accuracy (default).
splitting_lightFaster and cheaper.
1{
2 "config": {
3 "baseProcessor": "splitting_performance"
4 }
5}

baseVersion

Type: string

The version of the selected processor to use. If not provided, the latest stable version for the selected baseProcessor is used automatically. See the Splitting Performance versions.

1{
2 "config": {
3 "baseProcessor": "splitting_performance",
4 "baseVersion": "1.5.0"
5 }
6}

Advanced options

advancedOptions.splitExcelDocumentsBySheetEnabled

Type: boolean (default: false)

For Excel documents, split by worksheet.

1{
2 "config": {
3 "advancedOptions": {
4 "splitExcelDocumentsBySheetEnabled": true
5 }
6 }
7}

advancedOptions.pageOverlapEnabled

Type: boolean (default: false)

When enabled, the splitter allows page overlap so a page can occur in two adjacent splits when it carries context for both the previous and the next document.

Supported on splitting_light >= 1.1.0 and splitting_performance >= 1.2.0. On older versions it is accepted but ignored.

1{
2 "config": {
3 "advancedOptions": {
4 "pageOverlapEnabled": true
5 }
6 }
7}

advancedOptions.pageRanges

Type: Array<{ start: number, end: number }>

Restrict splitting to specific page ranges. Page numbers are 1-based and inclusive.

1{
2 "config": {
3 "advancedOptions": {
4 "pageRanges": [{ "start": 1, "end": 20 }]
5 }
6 }
7}

Parse config

parseConfig

Type: object

Because the document is parsed before it is split, you can tune that step with parseConfig (for example, the target format and chunking). See Parse Configuration for the full set of options.

1{
2 "config": {
3 "parseConfig": {
4 "target": "markdown",
5 "chunkingStrategy": { "type": "page" }
6 }
7 }
8}

Using a saved splitter

To reuse a configuration across runs and workflows, create a Splitter and reference it by id instead of inlining config each time. You can override specific fields per run with overrideConfig.

A splitter is a kind of processor — see that page for how saving a configuration lets you version, evaluate, and optimize it.

  • Create a splitter — set up a new splitter with your configuration.
  • Update a splitter — modify an existing splitter’s configuration.
  • Run a splitter — execute a splitter, optionally with splitter.overrideConfig.