Overview
Splitting takes a single file that bundles many documents and breaks it into separate, typed sub-documents. You describe the document types you expect with splitClassifications, and Extend returns one entry per detected sub-document with its type, page range, and a standalone fileId you can feed into parse, extract, or a workflow. Use it for loan packages, claim files, closing binders, and any multi-document upload that needs to be separated before processing.
Split runs Parse under the hood, parsing the file first if it hasn’t been parsed already and reusing the existing parsed output if it has.
Quick start
We’ll split a Uniform Residential Loan Application (Form 1003) into its sections — Section 1 spans two pages, and the rest are single pages. For this quick-start we’ve uploaded the file here.
Grab a key from the Developers page and store it as the EXTEND_API_KEY environment variable. If you’re using an SDK, see the installation instructions.
The /split endpoint takes a file and a config with the splitClassifications you expect.
Python
TypeScript
Java
Go
cURL
Want to split your own document? Upload it first, then pass the returned file id instead of a url (reusing the same config).
Python
TypeScript
Java
Go
cURL
Example response
After you run the code snippet above, you’ll see a response like this. Extend parses the document, finds each section, and returns an output.splits array — one entry per detected sub-document with its type, page range, and a fileId you can process further. (Truncated to the first two splits.)
Key fields
For full request/response details, see the Create Split Run API reference.
Use the output
Walk output.splits to read each sub-document’s type and page range, and use its fileId to process the piece — for example, sending a specific section to Extract. Branch your logic on classificationId rather than type: the classificationId is the stable id you defined, while type and description are part of the prompt that steers the split and may change as you tune accuracy.
Python
TypeScript
Java
Go
For the full shape, including every field on each split, see Response Format.
Sync vs async
The example above calls the synchronous /split endpoint. We also have an asynchronous /split_runs endpoint that should be used for large files and high volume use cases.
See Async Processing for the full comparison, polling options, and webhook setup.
Save it as a processor
The quick start runs with an inline config, which is perfect for getting started. To reuse a configuration across runs — and to version it, measure its accuracy, and optimize it — save it as a splitter, a kind of processor. Processors are the saved entities you iterate on in the dashboard, run evaluation sets against, and improve with Composer.
Configuration
The quick start sends file and config.splitClassifications. To control how splitting runs, pass more options inside config. Here are the most commonly used ones; for the full reference, see Configuration.
Split classifications
The splitClassifications define the document types the splitter can assign. Provide at least one, and at least one must have the type "other" as a catch-all. The description on each is your biggest lever on accuracy.
Identifier keys
Add an identifierKey to a classification to extract a unique identifier (like an invoice number or borrower name) from each sub-document of that type. The value is returned in each split’s identifier, and the splitter uses it to decide when adjacent pages belong to the same document.
Split rules
Steer how the document is divided with plain-language splitRules — for example, keeping multi-page contracts together.
Base processor
Choose the splitting model based on your accuracy and latency needs.
For every option, including advanced options and parse configuration, see the Configuration reference.

