Async Processing

Every processing endpoint in Extend (extract, classify, split, parse, and edit) supports both a synchronous and asynchronous mode. Workflows are async-only. Choosing the right mode depends on your use case.

Sync vs. Async

	Sync (`/extract`, `/classify`, etc.)	Async (`/extract_runs`, `/classify_runs`, etc.)
How it works	Blocks until the result is ready, then returns it	Returns immediately with a `PROCESSING` status
Timeout	5-minute hard limit	No timeout — runs until complete
Best for	Testing, onboarding, use cases with predictable document sizes or low volume	Production workloads, large files, high volume
Getting results	Included in the response	Poll the run, use SDK polling helpers, or receive via webhook
Availability	Extract, classify, split, parse, edit	All of the above + workflows

Sync endpoints have a 5-minute timeout. If processing takes longer, the request will fail. For production workloads, always use async endpoints.

Why Use Async in Production

Sync endpoints are convenient but they don’t scale well:

Large files — multi-page PDFs, complex Excel spreadsheets, and large documents often exceed the 5-minute timeout
High volume means many blocked connections waiting for results
Network issues can cause you to lose a result that already finished processing server-side
Workflows are async-only and can take minutes to hours depending on complexity

Async endpoints solve these problems. You fire off a request, get an ID back immediately, and retrieve the result when it’s ready — either by polling or by receiving a webhook.

Polling with SDK Helpers

The SDKs provide createAndPoll / create_and_poll methods that handle polling for you automatically. They use a hybrid strategy: fast polling (every ~1 second) for the first 30 seconds, then gradual backoff up to 30-second intervals. The method returns when the run reaches a terminal state.

Available Polling Methods

Resource	TypeScript	Python	Java
Extract	`client.extractRuns.createAndPoll()`	`client.extract_runs.create_and_poll()`	`client.extractRuns().createAndPoll()`
Classify	`client.classifyRuns.createAndPoll()`	`client.classify_runs.create_and_poll()`	`client.classifyRuns().createAndPoll()`
Split	`client.splitRuns.createAndPoll()`	`client.split_runs.create_and_poll()`	`client.splitRuns().createAndPoll()`
Parse	`client.parseRuns.createAndPoll()`	`client.parse_runs.create_and_poll()`	`client.parseRuns().createAndPoll()`
Edit	`client.editRuns.createAndPoll()`	`client.edit_runs.create_and_poll()`	`client.editRuns().createAndPoll()`
Workflow	`client.workflowRuns.createAndPoll()`	`client.workflow_runs.create_and_poll()`	`client.workflowRuns().createAndPoll()`

Terminal States

Polling completes when the run is no longer in a PROCESSING, PENDING, or CANCELLING state. The terminal states depend on the run type:

Run Type	Terminal States
Extract, Classify, Split	`PROCESSED`, `FAILED`, `CANCELLED`
Parse, Edit	`PROCESSED`, `FAILED`
Workflow	`PROCESSED`, `FAILED`, `CANCELLED`, `NEEDS_REVIEW`, `REJECTED`

Full Example with Error Handling

TypeScript

Python

Java

1 import { ExtendClient, ExtendError } from "extend-ai";
2 
3 const client = new ExtendClient({ token: "your-api-key" });
4 
5 try {
6   const result = await client.extractRuns.createAndPoll({
7     extractor: { id: "ex_abc123" },
8     file: { url: "https://example.com/invoice.pdf" },
9   });
10 
11   if (result.status === "PROCESSED") {
12     console.log("Extraction complete:", result.output?.value);
13   } else if (result.status === "FAILED") {
14     console.error("Extraction failed:", result.failureMessage);
15   } else if (result.status === "CANCELLED") {
16     console.log("Extraction was cancelled");
17   }
18 } catch (error) {
19   if (error instanceof ExtendError) {
20     console.error("API error:", error.message);
21   }
22   throw error;
23 }

Configuring Polling Options

You can customize polling behavior by passing options:

TypeScript

Python

Java

1 const result = await client.extractRuns.createAndPoll(
2   {
3     extractor: { id: "ex_abc123" },
4     file: { url: "https://example.com/invoice.pdf" },
5   },
6   {
7     maxWaitMs: 120_000, // Time out after 2 minutes
8   }
9 );

Option	Default	Description
`maxWaitMs` / `max_wait_ms`	None (polls indefinitely)	Maximum total wait time before throwing a `PollingTimeoutError`
`fastPollDurationMs` / `fast_poll_duration_ms`	30,000 (30s)	How long to stay in the fast polling phase
`fastPollIntervalMs` / `fast_poll_interval_ms`	1,000 (1s)	Interval between polls during the fast phase
`initialDelayMs` / `initial_delay_ms`	1,000 (1s)	Starting delay for the backoff phase
`maxDelayMs` / `max_delay_ms`	30,000 (30s)	Maximum delay between polls
`backoffMultiplier` / `backoff_multiplier`	1.15	Multiplier applied to each successive backoff delay

Webhooks

For event-driven architectures or very long-running processes (especially workflows), webhooks are the recommended approach. Instead of polling, Extend sends an HTTP request to your server when a run completes.

When to use webhooks over polling:

High volume — you’re processing hundreds or thousands of files and don’t want to keep processes alive waiting for results
Long-running workflows — complex workflows can take minutes to hours
Cost efficiency — with polling, the calling process must stay alive for the entire duration of the run, which can be expensive at scale
Event-driven architectures — you want to react to completions asynchronously in your backend

Every run type emits webhook events for completion and failure (e.g., extract_run.processed, extract_run.failed). Workflow runs also emit events for needs_review, rejected, and cancelled states.

For setup instructions, event types, and signature verification, see the Webhooks documentation.

Choosing the Right Approach

Approach	Best For	Trade-offs
Sync endpoints	Testing, onboarding, predictable document sizes or low volume	5-min timeout, blocks the caller, not available for workflows
SDK polling	Production with moderate volume, when you need results inline	Process must stay alive for the duration of the run; simple to implement
Webhooks	High volume, long-running workflows, event-driven systems	Requires a publicly accessible endpoint; more infrastructure to set up

For most production integrations, we recommend starting with SDK polling for its simplicity, and adding webhooks as you scale or when processing workflows.