For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Book a demoLog in
DocumentationAPI ReferenceModel VersioningChangelog
DocumentationAPI ReferenceModel VersioningChangelog
    • Studio
    • Support
    • Benchmarks
    • Status
  • Getting Started
    • Overview
    • API Quickstart
    • Dashboard Quickstart
    • Agent Quickstart
  • Dev Tools
    • SDKs
    • CLI
  • Capabilities
      • Overview
      • Configuration
      • Response Format
LogoLogo
Book a demoLog in
On this page
  • Quick start
  • Example response
  • Key fields
  • Use the output
  • Sync vs async
  • Save it as a processor
  • Configuration
  • Split classifications
  • Identifier keys
  • Split rules
  • Base processor
  • Next steps
CapabilitiesSplitting

Overview

Was this page helpful?
Previous

Configuration

Next
Built with

Splitting takes a single file that bundles many documents and breaks it into separate, typed sub-documents. You describe the document types you expect with splitClassifications, and Extend returns one entry per detected sub-document with its type, page range, and a standalone fileId you can feed into parse, extract, or a workflow. Use it for loan packages, claim files, closing binders, and any multi-document upload that needs to be separated before processing.

Quick start

We’ll split a Uniform Residential Loan Application (Form 1003) into its sections — Section 1 spans two pages, and the rest are single pages. For this quick-start we’ve uploaded the file here.

First page of a loan application packet

Grab a key from the Developers page and store it as the EXTEND_API_KEY environment variable. If you’re using an SDK, see the installation instructions.

$export EXTEND_API_KEY="your_api_key_here"

The /split endpoint takes a file and a config with the splitClassifications you expect.

Python
TypeScript
Java
Go
cURL
1import os
2from extend_ai import Extend
3
4client = Extend(token=os.environ["EXTEND_API_KEY"])
5
6result = client.split(
7 file={
8 "url": "https://extend-public-files.s3.us-east-2.amazonaws.com/loan_application.pdf",
9 },
10 config={
11 "baseProcessor": "splitting_performance",
12 "splitClassifications": [
13 {
14 "id": "section_1",
15 "type": "section_1",
16 "description": "Section 1, Borrower Information: personal details, current and previous employment, and income.",
17 },
18 {
19 "id": "section_2",
20 "type": "section_2",
21 "description": "Section 2, Financial Information — Assets and Liabilities: bank and retirement accounts, other assets, liabilities, and expenses.",
22 },
23 {
24 "id": "section_3",
25 "type": "section_3",
26 "description": "Section 3, Financial Information — Real Estate: properties owned and the mortgage loans on them.",
27 },
28 {
29 "id": "other",
30 "type": "other",
31 "description": "Any other section of the loan application (loan and property information, declarations, acknowledgments, military service, demographic information, or loan originator details).",
32 },
33 ],
34 },
35)
36
37print(result)

Want to split your own document? Upload it first, then pass the returned file id instead of a url (reusing the same config).

Python
TypeScript
Java
Go
cURL
1with open("loan_application.pdf", "rb") as f:
2 uploaded = client.files.upload(file=f)
3
4result = client.split(file={"id": uploaded.id}, config=config)

Example response

After you run the code snippet above, you’ll see a response like this. Extend parses the document, finds each section, and returns an output.splits array — one entry per detected sub-document with its type, page range, and a fileId you can process further. (Truncated to the first two splits.)

1{
2 "object": "split_run",
3 "id": "splr_Xj8mK2pL9nR4vT7qY5wZ",
4 "status": "PROCESSED",
5 "file": {
6 "object": "file",
7 "id": "file_GzKUy0VDhHscv7tweODYb",
8 "name": "loan_application.pdf"
9 },
10 "output": {
11 "splits": [
12 {
13 "id": "splt_xK9mLPqRtN3vS8wF5hB2cQ",
14 "classificationId": "section_1",
15 "type": "section_1",
16 "startPage": 1,
17 "endPage": 2,
18 "identifier": "",
19 "observation": "Pages 1-2 contain Section 1: Borrower Information.",
20 "fileId": "file_8sLPqRtN3vS2wF5hB2cQ"
21 },
22 {
23 "id": "splt_2pL9nR4vT7qY5wZj8mK2",
24 "classificationId": "section_2",
25 "type": "section_2",
26 "startPage": 3,
27 "endPage": 3,
28 "identifier": "",
29 "observation": "Page 3 contains Section 2: Financial Information — Assets and Liabilities.",
30 "fileId": "file_R4vT7qY5wZj8mK2pL9nR"
31 }
32 ]
33 }
34}

Key fields

FieldWhat it contains
output.splitsOne entry per detected sub-document.
splits[].classificationIdThe id of the classification that matched. Branch your logic on this — it’s stable.
splits[].typeThe document type, matching a classification you defined.
splits[].startPage / endPageThe 1-based page range of the sub-document.
splits[].identifierThe extracted identifier, when the classification set an identifierKey.
splits[].fileIdA standalone file for the sub-document, usable as input to other endpoints.

For full request/response details, see the Create Split Run API reference.

Use the output

Walk output.splits to read each sub-document’s type and page range, and use its fileId to process the piece — for example, sending a specific section to Extract. Branch your logic on classificationId rather than type: the classificationId is the stable id you defined, while type and description are part of the prompt that steers the split and may change as you tune accuracy.

Python
TypeScript
Java
Go
1for split in result.output.splits:
2 print(f"{split.type}: pages {split.start_page}-{split.end_page}")
3
4# Each split is a standalone file you can process further
5for split in result.output.splits:
6 if split.classification_id == "section_2":
7 client.extract(file={"id": split.file_id}, config={"schema": {...}})

For the full shape, including every field on each split, see Response Format.

Sync vs async

The example above calls the synchronous /split endpoint. We also have an asynchronous /split_runs endpoint that should be used for large files and high volume use cases.

See Async Processing for the full comparison, polling options, and webhook setup.

Save it as a processor

The quick start runs with an inline config, which is perfect for getting started. To reuse a configuration across runs — and to version it, measure its accuracy, and optimize it — save it as a splitter, a kind of processor. Processors are the saved entities you iterate on in the dashboard, run evaluation sets against, and improve with Composer.

Configuration

The quick start sends file and config.splitClassifications. To control how splitting runs, pass more options inside config. Here are the most commonly used ones; for the full reference, see Configuration.

Split classifications

The splitClassifications define the document types the splitter can assign. Provide at least one, and at least one must have the type "other" as a catch-all. The description on each is your biggest lever on accuracy.

1{
2 "config": {
3 "splitClassifications": [
4 { "id": "invoice", "type": "invoice", "description": "An invoice or bill for goods or services." },
5 { "id": "other", "type": "other", "description": "Any other document type." }
6 ]
7 }
8}

Identifier keys

Add an identifierKey to a classification to extract a unique identifier (like an invoice number or borrower name) from each sub-document of that type. The value is returned in each split’s identifier, and the splitter uses it to decide when adjacent pages belong to the same document.

1{
2 "config": {
3 "splitClassifications": [
4 { "id": "invoice", "type": "invoice", "description": "An invoice.", "identifierKey": "The invoice number from the header." }
5 ]
6 }
7}

Split rules

Steer how the document is divided with plain-language splitRules — for example, keeping multi-page contracts together.

1{ "config": { "splitRules": "Keep all pages of a signed contract together in a single split." } }

Base processor

Choose the splitting model based on your accuracy and latency needs.

1{ "config": { "baseProcessor": "splitting_performance" } }
ProcessorWhen to use
splitting_performanceHighest accuracy (default).
splitting_lightFaster and cheaper.

For every option, including advanced options and parse configuration, see the Configuration reference.


Next steps

Configuration

Split classifications, identifier keys, rules, and the base processor.

Response Format

The full shape of the split run and the splits array.

Workflows

Route each split sub-document to the right processor automatically.

API Reference

Full request and response schema for the split endpoint.