Async Processing

Every processing endpoint in Extend (extract, classify, split, parse, and edit) supports both a synchronous and asynchronous mode. Workflows are async-only. Choosing the right mode depends on your use case.

Sync vs. Async

Sync (/extract, /classify, etc.)Async (/extract_runs, /classify_runs, etc.)
How it worksBlocks until the result is ready, then returns itReturns immediately with a PROCESSING status
Timeout5-minute hard limitNo timeout — runs until complete
Best forTesting, onboarding, use cases with predictable document sizes or low volumeProduction workloads, large files, high volume
Getting resultsIncluded in the responsePoll the run, use SDK polling helpers, or receive via webhook
AvailabilityExtract, classify, split, parse, editAll of the above + workflows

Sync endpoints have a 5-minute timeout. If processing takes longer, the request will fail. For production workloads, always use async endpoints.

Why Use Async in Production

Sync endpoints are convenient but they don’t scale well:

  • Large files — multi-page PDFs, complex Excel spreadsheets, and large documents often exceed the 5-minute timeout
  • High volume means many blocked connections waiting for results
  • Network issues can cause you to lose a result that already finished processing server-side
  • Workflows are async-only and can take minutes to hours depending on complexity

Async endpoints solve these problems. You fire off a request, get an ID back immediately, and retrieve the result when it’s ready — either by polling or by receiving a webhook.

Polling with SDK Helpers

The SDKs provide createAndPoll / create_and_poll methods that handle polling for you automatically. They use a hybrid strategy: fast polling (every ~1 second) for the first 30 seconds, then gradual backoff up to 30-second intervals. The method returns when the run reaches a terminal state.

Available Polling Methods

ResourceTypeScriptPythonJava
Extractclient.extractRuns.createAndPoll()client.extract_runs.create_and_poll()client.extractRuns().createAndPoll()
Classifyclient.classifyRuns.createAndPoll()client.classify_runs.create_and_poll()client.classifyRuns().createAndPoll()
Splitclient.splitRuns.createAndPoll()client.split_runs.create_and_poll()client.splitRuns().createAndPoll()
Parseclient.parseRuns.createAndPoll()client.parse_runs.create_and_poll()client.parseRuns().createAndPoll()
Editclient.editRuns.createAndPoll()client.edit_runs.create_and_poll()client.editRuns().createAndPoll()
Workflowclient.workflowRuns.createAndPoll()client.workflow_runs.create_and_poll()client.workflowRuns().createAndPoll()

Terminal States

Polling completes when the run is no longer in a PROCESSING, PENDING, or CANCELLING state. The terminal states depend on the run type:

Run TypeTerminal States
Extract, Classify, SplitPROCESSED, FAILED, CANCELLED
Parse, EditPROCESSED, FAILED
WorkflowPROCESSED, FAILED, CANCELLED, NEEDS_REVIEW, REJECTED

Full Example with Error Handling

1import { ExtendClient, ExtendError } from "extend-ai";
2
3const client = new ExtendClient({ token: "your-api-key" });
4
5try {
6 const result = await client.extractRuns.createAndPoll({
7 extractor: { id: "ex_abc123" },
8 file: { url: "https://example.com/invoice.pdf" },
9 });
10
11 if (result.status === "PROCESSED") {
12 console.log("Extraction complete:", result.output?.value);
13 } else if (result.status === "FAILED") {
14 console.error("Extraction failed:", result.failureMessage);
15 } else if (result.status === "CANCELLED") {
16 console.log("Extraction was cancelled");
17 }
18} catch (error) {
19 if (error instanceof ExtendError) {
20 console.error("API error:", error.message);
21 }
22 throw error;
23}

Configuring Polling Options

You can customize polling behavior by passing options:

1const result = await client.extractRuns.createAndPoll(
2 {
3 extractor: { id: "ex_abc123" },
4 file: { url: "https://example.com/invoice.pdf" },
5 },
6 {
7 maxWaitMs: 120_000, // Time out after 2 minutes
8 }
9);
OptionDefaultDescription
maxWaitMs / max_wait_msNone (polls indefinitely)Maximum total wait time before throwing a PollingTimeoutError
fastPollDurationMs / fast_poll_duration_ms30,000 (30s)How long to stay in the fast polling phase
fastPollIntervalMs / fast_poll_interval_ms1,000 (1s)Interval between polls during the fast phase
initialDelayMs / initial_delay_ms1,000 (1s)Starting delay for the backoff phase
maxDelayMs / max_delay_ms30,000 (30s)Maximum delay between polls
backoffMultiplier / backoff_multiplier1.15Multiplier applied to each successive backoff delay

Webhooks

For event-driven architectures or very long-running processes (especially workflows), webhooks are the recommended approach. Instead of polling, Extend sends an HTTP request to your server when a run completes.

When to use webhooks over polling:

  • High volume — you’re processing hundreds or thousands of files and don’t want to keep processes alive waiting for results
  • Long-running workflows — complex workflows can take minutes to hours
  • Cost efficiency — with polling, the calling process must stay alive for the entire duration of the run, which can be expensive at scale
  • Event-driven architectures — you want to react to completions asynchronously in your backend

Every run type emits webhook events for completion and failure (e.g., extract_run.processed, extract_run.failed). Workflow runs also emit events for needs_review, rejected, and cancelled states.

For setup instructions, event types, and signature verification, see the Webhooks documentation.

Choosing the Right Approach

ApproachBest ForTrade-offs
Sync endpointsTesting, onboarding, predictable document sizes or low volume5-min timeout, blocks the caller, not available for workflows
SDK pollingProduction with moderate volume, when you need results inlineProcess must stay alive for the duration of the run; simple to implement
WebhooksHigh volume, long-running workflows, event-driven systemsRequires a publicly accessible endpoint; more infrastructure to set up

For most production integrations, we recommend starting with SDK polling for its simplicity, and adding webhooks as you scale or when processing workflows.