For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
GuidesAPI ReferenceChangelogModel Versioning
GuidesAPI ReferenceChangelogModel Versioning
    • Getting Started
        • Configuring a Splitter
        • Output Type
      • Batch Processing
LogoLogo
On this page
  • Splitter Configuration
  • Example API Usecases
  • Webhooks
  • Configurable Fields
  • Splitter Configuration Example
  • Per-Classification Identifier Keys
Core Document ProcessingSplitting

Configuring a Splitter

Was this page helpful?
Previous

Splitter output types

Next
Built with

Splitter Configuration

“Splitters” break down multi-page documents into separate, organized sections. These can be configured via either the UI or the API.

Example API Usecases

  • Create a splitter with a given config - Set up a new splitter with your configuration
  • Update a splitter with a given config - Modify an existing splitter’s configuration
  • Run a splitter with config overrides - Execute a splitter with optional config overrides using splitter.overrideConfig

Webhooks

You can consume splitter outputs via webhooks, track updates to your splitter, and more. See Webhook Events for details.

Configurable Fields

You can view full details of the SplitConfig in our API reference.

When working with the SplitConfig, you can configure several key aspects, such as:

  • Split Classifications - Define the possible sub-documents inside the document you wish to split, with optional per-type identifier extraction rules
  • Split Rules - Custom instructions that guide how the splitter divides documents
  • Base Processor - Specify which splitting model to use (required)
  • Advanced options - Configure split methods, page ranges, Excel sheet splitting, and other specialized settings
  • Parse Config - Configure how documents are parsed before splitting

Splitter Configuration Example

1const splitConfig = {
2 baseProcessor: "splitting_performance",
3 splitClassifications: [
4 {
5 id: "purchase_contract",
6 type: "PURCHASE_CONTRACT",
7 description: "Purchase contract section",
8 identifierKey: "Extract the contract number from the document header",
9 },
10 {
11 id: "addendum",
12 type: "ADDENDUM",
13 description: "Addendum section",
14 identifierKey: "addendum reference number",
15 },
16 {
17 id: "other",
18 type: "other",
19 description: "Any other document type",
20 },
21 ],
22 splitRules: "- If ...", // Optional custom rules
23 advancedOptions: {
24 splitMethod: "high_precision", // splitMethod is deprecated on light >= 1.3.0 and performance >= 1.5.0
25 },
26 parseConfig: {
27 target: "markdown",
28 chunkingStrategy: { type: "page" }
29 }
30};

Per-Classification Identifier Keys

The identifierKey field on each classification allows you to define a natural-language rule for extracting a unique identifier from subdocuments of that type. For example, you might extract an invoice number, contract ID, or receipt number.

When provided, the splitter extracts the specified identifier and includes it in the output’s identifier field. This also helps the system understand when to separate out subdocuments of this type

identifierKey is supported on splitting_light >= 1.3.0 and splitting_performance >= 1.5.0. If passed on older versions, it is accepted but ignored. It replaces the deprecated advancedOptions.splitIdentifierRules field, which is likewise accepted but ignored on those newer versions.