Multifile Extraction

Multifile extraction lets you run a single extraction over a collection of files with a shared context. Useful when a field’s value must be chosen from the best among a variety of sources, or when values are derived from multiple files.

Example 1: A contract contains a number of amendments and those amendments supersede the original content.

Example 2: A user submits multiple pictures of a long receipt that should be extracted together.

How it works

Pass a package instead of a file on your request. The package.files array accepts up to 50 entries, each either a URL or an existing Extend file ID. The API ingests all files concurrently, runs extraction across the full corpus, and returns a single ExtractRun with a files array in the response (and file: null).

Quick start

1from extend_ai import Extend
2
3client = Extend()
4
5result = client.extract_runs.create_and_poll(
6 extractor={"id": "ex_abc123"},
7 package={
8 "files": [
9 {"url": "https://example.com/invoice1.pdf"},
10 {"url": "https://example.com/invoice2.pdf"},
11 {"url": "https://example.com/invoice3.pdf"},
12 ]
13 },
14)
15
16print(result.output.value)
17print("Files processed:", [f.name for f in result.files])

File inputs

Each entry in package.files can be either:

InputShapeNotes
URL{ "url": "https://..." }Presigned URLs recommended for production
File ID{ "id": "file_..." }Reuse a previously uploaded Extend file

Raw text (text) and base64 inputs are not supported in multifile packages — use url or id.

You can mix URLs and file IDs in the same package:

1{
2 "package": {
3 "files": [
4 { "url": "https://example.com/cover-page.pdf" },
5 { "id": "file_xK9mLPqRtN3vS8wF5hB2cQ" },
6 { "url": "https://example.com/appendix.pdf" }
7 ]
8 }
9}

Response

A multifile run returns the same ExtractRun shape as a single-file run, with two differences:

  • file is null
  • files is an ordered array of FileSummary objects, one per input file in submission order
1{
2 "object": "extract_run",
3 "id": "exr_3f1j6I1gsw5k96xFiCnkM",
4 "status": "PROCESSED",
5 "file": null,
6 "files": [
7 { "object": "file", "id": "file_aaa", "name": "invoice1.pdf", ... },
8 { "object": "file", "id": "file_bbb", "name": "invoice2.pdf", ... },
9 { "object": "file", "id": "file_ccc", "name": "invoice3.pdf", ... }
10 ],
11 "output": {
12 "value": { ... },
13 "metadata": { ... }
14 }
15}

output.value is a single object covering the whole corpus — not one object per file. Design your extractor schema to describe what you want extracted across all files together.

Multifile vs batch

Multifile extraction and batch processing are complementary but different:

Multifile (package)Batch (/extract_runs/batch)
What it isOne run across N files — a single corpusN independent single-file runs submitted together
OutputOne output.value combining all filesOne output.value per file
Use whenFields span multiple documentsEach file is extracted independently
Files per request1–50Up to 1,000

Use multifile when your extractor schema is designed to aggregate across a set of documents. Use batch when you just want to submit many independent files efficiently.