Splitting takes a single file that bundles many documents and breaks it into separate, typed sub-documents. You describe the document types you expect with splitClassifications, and Extend returns one entry per detected sub-document with its type, page range, and a standalone fileId you can feed into parse, extract, or a workflow. Use it for loan packages, claim files, closing binders, and any multi-document upload that needs to be separated before processing.
We’ll split a Uniform Residential Loan Application (Form 1003) into its sections — Section 1 spans two pages, and the rest are single pages. For this quick-start we’ve uploaded the file here.
Grab a key from the Developers page and store it as the EXTEND_API_KEY environment variable. If you’re using an SDK, see the installation instructions.
The /split endpoint takes a file and a config with the splitClassifications you expect.
Want to split your own document? Upload it first, then pass the returned file id instead of a url (reusing the same config).
After you run the code snippet above, you’ll see a response like this. Extend parses the document, finds each section, and returns an output.splits array — one entry per detected sub-document with its type, page range, and a fileId you can process further. (Truncated to the first two splits.)
For full request/response details, see the Create Split Run API reference.
Walk output.splits to read each sub-document’s type and page range, and use its fileId to process the piece — for example, sending a specific section to Extract. Branch your logic on classificationId rather than type: the classificationId is the stable id you defined, while type and description are part of the prompt that steers the split and may change as you tune accuracy.
For the full shape, including every field on each split, see Response Format.
The example above calls the synchronous /split endpoint. We also have an asynchronous /split_runs endpoint that should be used for large files and high volume use cases.
See Async Processing for the full comparison, polling options, and webhook setup.
The quick start runs with an inline config, which is perfect for getting started. To reuse a configuration across runs — and to version it, measure its accuracy, and optimize it — save it as a splitter, a kind of processor. Processors are the saved entities you iterate on in the dashboard, run evaluation sets against, and improve with Composer.
The quick start sends file and config.splitClassifications. To control how splitting runs, pass more options inside config. Here are the most commonly used ones; for the full reference, see Configuration.
The splitClassifications define the document types the splitter can assign. Provide at least one, and at least one must have the type "other" as a catch-all. The description on each is your biggest lever on accuracy.
Add an identifierKey to a classification to extract a unique identifier (like an invoice number or borrower name) from each sub-document of that type. The value is returned in each split’s identifier, and the splitter uses it to decide when adjacent pages belong to the same document.
Steer how the document is divided with plain-language splitRules — for example, keeping multi-page contracts together.
Choose the splitting model based on your accuracy and latency needs.
For every option, including advanced options and parse configuration, see the Configuration reference.