Configuration

The Split API accepts a config object that controls how a document is divided into sub-documents. Only splitClassifications is required — it defines the document types the splitter can assign. Everything else is optional: use splitRules to guide where the splitter divides the document, baseProcessor and baseVersion to pick the model, advancedOptions for Excel handling, page overlap, and page ranges, and parseConfig to tune how the document is parsed before splitting.

You can pass config inline on a one-off /split call, or save it to a reusable Splitter and reference that splitter by id. Either way the configuration is identical.

For default values and the full schema, see the Create Split Run API reference.

Prefer a UI? Extend Studio lets you configure the splitter visually and export the config JSON.

Split classifications

`splitClassifications`

Type: array of classification objects (required)

The document types the splitter can assign to each sub-document. You must provide at least one classification, and at least one classification must have the type "other" as a catch-all. Each entry must have a unique id.

Field	Type	Required	Description
`id`	string	Yes	Unique identifier for the classification. Lowercase, underscore-separated is recommended.
`type`	string	Yes	Type identifier for the classification, returned on each split as `type`.
`description`	string	Yes	A detailed description of the document type. This is your biggest lever on accuracy.
`identifierKey`	string	No	A natural-language rule for extracting a per-type identifier (see below).

1 {
2   "config": {
3     "splitClassifications": [
4       {
5         "id": "loan_application",
6         "type": "loan_application",
7         "description": "A Uniform Residential Loan Application (Form 1003)."
8       },
9       {
10         "id": "other",
11         "type": "other",
12         "description": "Any other document type."
13       }
14     ]
15   }
16 }

`splitClassifications[].identifierKey`

Type: string

A natural-language rule describing how to extract a unique identifier for sub-documents of this type, for example "Extract the invoice number from the document header". When provided, the splitter extracts the identifier for each sub-document of that type and returns it in the split’s identifier field. It also uses the value to decide when adjacent pages belong to the same document versus a new one.

Supported on splitting_light >= 1.3.0 and splitting_performance >= 1.5.0. On older versions it is accepted but ignored. It replaces the deprecated advancedOptions.splitIdentifierRules field, which set a single global rule.

1 {
2   "config": {
3     "splitClassifications": [
4       {
5         "id": "invoice",
6         "type": "invoice",
7         "description": "An invoice or bill for goods or services.",
8         "identifierKey": "Extract the invoice number from the document header"
9       }
10     ]
11   }
12 }

Split rules

`splitRules`

Type: string

Custom rules, in natural language, that guide how the splitter divides the document — for example “Keep all pages of a signed contract together” or “Treat each new account number as a new statement.”

1 {
2   "config": {
3     "splitRules": "Keep all pages of a signed contract together in a single split."
4   }
5 }

Base processor

`baseProcessor`

Type: "splitting_performance" | "splitting_light" (default: splitting_performance)

The splitting model to use.

Processor	When to use
`splitting_performance`	Highest accuracy (default).
`splitting_light`	Faster and cheaper.

1 {
2   "config": {
3     "baseProcessor": "splitting_performance"
4   }
5 }

`baseVersion`

Type: string

The version of the selected processor to use. If not provided, the latest stable version for the selected baseProcessor is used automatically. See the Splitting Performance versions.

1 {
2   "config": {
3     "baseProcessor": "splitting_performance",
4     "baseVersion": "1.5.0"
5   }
6 }

Advanced options

`advancedOptions.splitExcelDocumentsBySheetEnabled`

Type: boolean (default: false)

For Excel documents, split by worksheet.

1 {
2   "config": {
3     "advancedOptions": {
4       "splitExcelDocumentsBySheetEnabled": true
5     }
6   }
7 }

`advancedOptions.pageOverlapEnabled`

Type: boolean (default: false)

When enabled, the splitter allows page overlap so a page can occur in two adjacent splits when it carries context for both the previous and the next document.

Supported on splitting_light >= 1.1.0 and splitting_performance >= 1.2.0. On older versions it is accepted but ignored.

1 {
2   "config": {
3     "advancedOptions": {
4       "pageOverlapEnabled": true
5     }
6   }
7 }

`advancedOptions.pageRanges`

Type: Array<{ start: number, end: number }>

Restrict splitting to specific page ranges. Page numbers are 1-based and inclusive.

1 {
2   "config": {
3     "advancedOptions": {
4       "pageRanges": [{ "start": 1, "end": 20 }]
5     }
6   }
7 }

Parse config

`parseConfig`

Type: object

Because the document is parsed before it is split, you can tune that step with parseConfig (for example, the target format and chunking). See Parse Configuration for the full set of options.

1 {
2   "config": {
3     "parseConfig": {
4       "target": "markdown",
5       "chunkingStrategy": { "type": "page" }
6     }
7   }
8 }

Using a saved splitter

To reuse a configuration across runs and workflows, create a Splitter and reference it by id instead of inlining config each time. You can override specific fields per run with overrideConfig.

A splitter is a kind of processor — see that page for how saving a configuration lets you version, evaluate, and optimize it.

Create a splitter — set up a new splitter with your configuration.
Update a splitter — modify an existing splitter’s configuration.
Run a splitter — execute a splitter, optionally with splitter.overrideConfig.

You can pass config inline on a one-off /split call, or save it to a reusable Splitter and reference that splitter by id. Either way the configuration is identical.

For default values and the full schema, see the Create Split Run API reference.

Prefer a UI? Extend Studio lets you configure the splitter visually and export the config JSON.

Split classifications

`splitClassifications`

Type: array of classification objects (required)

Field	Type	Required	Description
`id`	string	Yes	Unique identifier for the classification. Lowercase, underscore-separated is recommended.
`type`	string	Yes	Type identifier for the classification, returned on each split as `type`.
`description`	string	Yes	A detailed description of the document type. This is your biggest lever on accuracy.
`identifierKey`	string	No	A natural-language rule for extracting a per-type identifier (see below).

1 {
2   "config": {
3     "splitClassifications": [
4       {
5         "id": "loan_application",
6         "type": "loan_application",
7         "description": "A Uniform Residential Loan Application (Form 1003)."
8       },
9       {
10         "id": "other",
11         "type": "other",
12         "description": "Any other document type."
13       }
14     ]
15   }
16 }

`splitClassifications[].identifierKey`

Type: string

1 {
2   "config": {
3     "splitClassifications": [
4       {
5         "id": "invoice",
6         "type": "invoice",
7         "description": "An invoice or bill for goods or services.",
8         "identifierKey": "Extract the invoice number from the document header"
9       }
10     ]
11   }
12 }

Split rules

`splitRules`

Type: string

1 {
2   "config": {
3     "splitRules": "Keep all pages of a signed contract together in a single split."
4   }
5 }

Base processor

`baseProcessor`

Type: "splitting_performance" | "splitting_light" (default: splitting_performance)

The splitting model to use.

Processor	When to use
`splitting_performance`	Highest accuracy (default).
`splitting_light`	Faster and cheaper.

1 {
2   "config": {
3     "baseProcessor": "splitting_performance"
4   }
5 }

`baseVersion`

Type: string

The version of the selected processor to use. If not provided, the latest stable version for the selected baseProcessor is used automatically. See the Splitting Performance versions.

1 {
2   "config": {
3     "baseProcessor": "splitting_performance",
4     "baseVersion": "1.5.0"
5   }
6 }

Advanced options

`advancedOptions.splitExcelDocumentsBySheetEnabled`

Type: boolean (default: false)

For Excel documents, split by worksheet.

1 {
2   "config": {
3     "advancedOptions": {
4       "splitExcelDocumentsBySheetEnabled": true
5     }
6   }
7 }

`advancedOptions.pageOverlapEnabled`

Type: boolean (default: false)

When enabled, the splitter allows page overlap so a page can occur in two adjacent splits when it carries context for both the previous and the next document.

Supported on splitting_light >= 1.1.0 and splitting_performance >= 1.2.0. On older versions it is accepted but ignored.

1 {
2   "config": {
3     "advancedOptions": {
4       "pageOverlapEnabled": true
5     }
6   }
7 }

`advancedOptions.pageRanges`

Type: Array<{ start: number, end: number }>

Restrict splitting to specific page ranges. Page numbers are 1-based and inclusive.

1 {
2   "config": {
3     "advancedOptions": {
4       "pageRanges": [{ "start": 1, "end": 20 }]
5     }
6   }
7 }

Parse config

`parseConfig`

Type: object

Because the document is parsed before it is split, you can tune that step with parseConfig (for example, the target format and chunking). See Parse Configuration for the full set of options.

1 {
2   "config": {
3     "parseConfig": {
4       "target": "markdown",
5       "chunkingStrategy": { "type": "page" }
6     }
7   }
8 }

Using a saved splitter

A splitter is a kind of processor — see that page for how saving a configuration lets you version, evaluate, and optimize it.

Create a splitter — set up a new splitter with your configuration.
Update a splitter — modify an existing splitter’s configuration.
Run a splitter — execute a splitter, optionally with splitter.overrideConfig.

1	{
2	"config": {
3	"splitClassifications": [
4	{
5	"id": "loan_application",
6	"type": "loan_application",
7	"description": "A Uniform Residential Loan Application (Form 1003)."
8	},
9	{
10	"id": "other",
11	"type": "other",
12	"description": "Any other document type."
13	}
14	]
15	}
16	}

1	{
2	"config": {
3	"splitClassifications": [
4	{
5	"id": "invoice",
6	"type": "invoice",
7	"description": "An invoice or bill for goods or services.",
8	"identifierKey": "Extract the invoice number from the document header"
9	}
10	]
11	}
12	}

1	{
2	"config": {
3	"splitRules": "Keep all pages of a signed contract together in a single split."
4	}
5	}

1	{
2	"config": {
3	"baseProcessor": "splitting_performance",
4	"baseVersion": "1.5.0"
5	}
6	}

1	{
2	"config": {
3	"advancedOptions": {
4	"splitExcelDocumentsBySheetEnabled": true
5	}
6	}
7	}

1	{
2	"config": {
3	"advancedOptions": {
4	"pageOverlapEnabled": true
5	}
6	}
7	}

1	{
2	"config": {
3	"advancedOptions": {
4	"pageRanges": [{ "start": 1, "end": 20 }]
5	}
6	}
7	}

1	{
2	"config": {
3	"parseConfig": {
4	"target": "markdown",
5	"chunkingStrategy": { "type": "page" }
6	}
7	}
8	}