Multifile Extraction

Multifile extraction lets you run a single extraction over a collection of files with a shared context. Useful when a field’s value must be chosen from the best among a variety of sources, or when values are derived from multiple files.

Example 1: A contract contains a number of amendments and those amendments supersede the original content.

Example 2: A user submits multiple pictures of a long receipt that should be extracted together.

How it works

Pass a package instead of a file on your request. The package.files array accepts up to 50 entries, each either a URL or an existing Extend file ID. The API ingests all files concurrently, runs extraction across the full corpus, and returns a single ExtractRun with a files array in the response (and file: null).

Quick start

Python

TypeScript

Java

Go

cURL

1 from extend_ai import Extend
2 
3 client = Extend()
4 
5 result = client.extract_runs.create_and_poll(
6     extractor={"id": "ex_abc123"},
7     package={
8         "files": [
9             {"url": "https://example.com/invoice1.pdf"},
10             {"url": "https://example.com/invoice2.pdf"},
11             {"url": "https://example.com/invoice3.pdf"},
12         ]
13     },
14 )
15 
16 print(result.output.value)
17 print("Files processed:", [f.name for f in result.files])

File inputs

Each entry in package.files can be either:

Input	Shape	Notes
URL	`{ "url": "https://..." }`	Presigned URLs recommended for production
File ID	`{ "id": "file_..." }`	Reuse a previously uploaded Extend file

Raw text (text) and base64 inputs are not supported in multifile packages — use url or id.

You can mix URLs and file IDs in the same package:

1 {
2   "package": {
3     "files": [
4       { "url": "https://example.com/cover-page.pdf" },
5       { "id": "file_xK9mLPqRtN3vS8wF5hB2cQ" },
6       { "url": "https://example.com/appendix.pdf" }
7     ]
8   }
9 }

Response

A multifile run returns the same ExtractRun shape as a single-file run, with two differences:

file is null
files is an ordered array of FileSummary objects, one per input file in submission order

1 {
2   "object": "extract_run",
3   "id": "exr_3f1j6I1gsw5k96xFiCnkM",
4   "status": "PROCESSED",
5   "file": null,
6   "files": [
7     { "object": "file", "id": "file_aaa", "name": "invoice1.pdf", ... },
8     { "object": "file", "id": "file_bbb", "name": "invoice2.pdf", ... },
9     { "object": "file", "id": "file_ccc", "name": "invoice3.pdf", ... }
10   ],
11   "output": {
12     "value": { ... },
13     "metadata": { ... }
14   }
15 }

output.value is a single object covering the whole corpus — not one object per file. Design your extractor schema to describe what you want extracted across all files together.

Multifile vs batch

Multifile extraction and batch processing are complementary but different:

	Multifile (`package`)	Batch (`/extract_runs/batch`)
What it is	One run across N files — a single corpus	N independent single-file runs submitted together
Output	One `output.value` combining all files	One `output.value` per file
Use when	Fields span multiple documents	Each file is extracted independently
Files per request	1–50	Up to 1,000

Use multifile when your extractor schema is designed to aggregate across a set of documents. Use batch when you just want to submit many independent files efficiently.