> ## Documentation Index
> Fetch the complete documentation index at: https://docs.extend.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Async Processing

> Understand sync vs. async processing in Extend, and how to use SDK polling helpers and webhooks for production workloads.

Every processing endpoint in Extend (extract, classify, split, parse, and edit) supports both a **synchronous** and **asynchronous** mode. Workflows are async-only. Choosing the right mode depends on your use case.

## Sync vs. Async

|                     | Sync (`/extract`, `/classify`, etc.)                                         | Async (`/extract_runs`, `/classify_runs`, etc.)               |
| ------------------- | ---------------------------------------------------------------------------- | ------------------------------------------------------------- |
| **How it works**    | Blocks until the result is ready, then returns it                            | Returns immediately with a `PROCESSING` status                |
| **Timeout**         | 5-minute hard limit                                                          | No timeout — runs until complete                              |
| **Best for**        | Testing, onboarding, use cases with predictable document sizes or low volume | Production workloads, large files, high volume                |
| **Getting results** | Included in the response                                                     | Poll the run, use SDK polling helpers, or receive via webhook |
| **Availability**    | Extract, classify, split, parse, edit                                        | All of the above + workflows                                  |

Sync endpoints have a **5-minute timeout**. If processing takes longer, the request will fail. For production workloads, always use async endpoints.

## Why Use Async in Production

Sync endpoints are convenient but they don't scale well:

* **Large files** — multi-page PDFs, complex Excel spreadsheets, and large documents often exceed the 5-minute timeout
* **High volume** means many blocked connections waiting for results
* **Network issues** can cause you to lose a result that already finished processing server-side
* **Workflows** are async-only and can take minutes to hours depending on complexity

Async endpoints solve these problems. You fire off a request, get an ID back immediately, and retrieve the result when it's ready — either by polling or by receiving a webhook.

## Polling with SDK Helpers

The SDKs provide `createAndPoll` / `create_and_poll` methods that handle polling for you automatically. They use a hybrid strategy: fast polling (every \~1 second) for the first 30 seconds, then gradual backoff up to 30-second intervals. The method returns when the run reaches a terminal state.

### Available Polling Methods

```python
client.extract_runs.create_and_poll()    # Extract
client.classify_runs.create_and_poll()   # Classify
client.split_runs.create_and_poll()      # Split
client.parse_runs.create_and_poll()      # Parse
client.edit_runs.create_and_poll()       # Edit
client.workflow_runs.create_and_poll()   # Workflow
```

```typescript
client.extractRuns.createAndPoll();    // Extract
client.classifyRuns.createAndPoll();   // Classify
client.splitRuns.createAndPoll();      // Split
client.parseRuns.createAndPoll();      // Parse
client.editRuns.createAndPoll();       // Edit
client.workflowRuns.createAndPoll();   // Workflow
```

```java
client.extractRuns().createAndPoll();    // Extract
client.classifyRuns().createAndPoll();   // Classify
client.splitRuns().createAndPoll();      // Split
client.parseRuns().createAndPoll();      // Parse
client.editRuns().createAndPoll();       // Edit
client.workflowRuns().createAndPoll();   // Workflow
```

The Go SDK does not include a polling helper. Create the run, then poll it yourself (or use webhooks).

### Terminal States

Polling completes when the run is no longer in a `PROCESSING`, `PENDING`, or `CANCELLING` state. The terminal states depend on the run type:

| Run Type                        | Terminal States                                                |
| ------------------------------- | -------------------------------------------------------------- |
| Parse, Extract, Classify, Split | `PROCESSED`, `FAILED`, `CANCELLED`                             |
| Edit                            | `PROCESSED`, `FAILED`                                          |
| Workflow                        | `PROCESSED`, `FAILED`, `CANCELLED`, `NEEDS_REVIEW`, `REJECTED` |

### Full Example with Error Handling

```typescript
import { ExtendClient, ExtendError } from "extend-ai";

const client = new ExtendClient({ token: "your-api-key" });

try {
  const result = await client.extractRuns.createAndPoll({
    extractor: { id: "ex_abc123" },
    file: { url: "https://example.com/invoice.pdf" },
  });

  if (result.status === "PROCESSED") {
    console.log("Extraction complete:", result.output?.value);
  } else if (result.status === "FAILED") {
    console.error("Extraction failed:", result.failureMessage);
  } else if (result.status === "CANCELLED") {
    console.log("Extraction was cancelled");
  }
} catch (error) {
  if (error instanceof ExtendError) {
    console.error("API error:", error.message);
  }
  throw error;
}
```

```python
from extend_ai import Extend, ExtendError

client = Extend(token="your-api-key")

try:
    result = client.extract_runs.create_and_poll(
        extractor={"id": "ex_abc123"},
        file={"url": "https://example.com/invoice.pdf"},
    )

    if result.status == "PROCESSED":
        print("Extraction complete:", result.output.value)
    elif result.status == "FAILED":
        print("Extraction failed:", result.failure_message)
    elif result.status == "CANCELLED":
        print("Extraction was cancelled")

except ExtendError as e:
    print("API error:", e)
    raise
```

```java
import ai.extend.ExtendClient;
import ai.extend.types.ExtractRun;
import ai.extend.types.ExtractOutputJson;
import ai.extend.types.FileFromUrl;
import ai.extend.resources.extractruns.requests.ExtractRunsCreateRequest;
import ai.extend.resources.extractruns.types.ExtractRunsCreateRequestExtractor;
import ai.extend.resources.extractruns.types.ExtractRunsCreateRequestFile;

ExtendClient client = ExtendClient.builder()
    .apiKey("your-api-key")
    .build();

try {
    ExtractRun result = client.extractRuns().createAndPoll(
        ExtractRunsCreateRequest.builder()
            .file(ExtractRunsCreateRequestFile.of(FileFromUrl.builder().url("https://example.com/invoice.pdf").build()))
            .extractor(ExtractRunsCreateRequestExtractor.builder().id("ex_abc123").build())
            .build()
    );

    switch (result.getStatus().toString()) {
        case "PROCESSED":
            result.getOutput().ifPresent(output ->
                System.out.println("Extraction complete: " + ((ExtractOutputJson) output.get()).getValue()));
            break;
        case "FAILED":
            System.out.println("Extraction failed: " + result.getFailureMessage());
            break;
        case "CANCELLED":
            System.out.println("Extraction was cancelled");
            break;
    }
} catch (Exception e) {
    System.err.println("API error: " + e.getMessage());
    throw e;
}
```

### Configuring Polling Options

You can customize polling behavior by passing options:

```typescript
const result = await client.extractRuns.createAndPoll(
  {
    extractor: { id: "ex_abc123" },
    file: { url: "https://example.com/invoice.pdf" },
  },
  {
    maxWaitMs: 120_000, // Time out after 2 minutes
  }
);
```

```python
from extend_ai import PollingOptions

result = client.extract_runs.create_and_poll(
    extractor={"id": "ex_abc123"},
    file={"url": "https://example.com/invoice.pdf"},
    polling_options=PollingOptions(
        max_wait_ms=120_000,  # Time out after 2 minutes
    ),
)
```

```java
import ai.extend.wrapper.utilities.polling.PollingOptions;

ExtractRun result = client.extractRuns().createAndPoll(
    ExtractRunsCreateRequest.builder()
        .file(ExtractRunsCreateRequestFile.of(FileFromUrl.builder().url("https://example.com/invoice.pdf").build()))
        .extractor(ExtractRunsCreateRequestExtractor.builder().id("ex_abc123").build())
        .build(),
    PollingOptions.builder()
        .maxWaitMs(120_000) // Time out after 2 minutes
        .build()
);
```

| Option                                         | Default                   | Description                                                     |
| ---------------------------------------------- | ------------------------- | --------------------------------------------------------------- |
| `maxWaitMs` / `max_wait_ms`                    | None (polls indefinitely) | Maximum total wait time before throwing a `PollingTimeoutError` |
| `fastPollDurationMs` / `fast_poll_duration_ms` | 30,000 (30s)              | How long to stay in the fast polling phase                      |
| `fastPollIntervalMs` / `fast_poll_interval_ms` | 1,000 (1s)                | Interval between polls during the fast phase                    |
| `initialDelayMs` / `initial_delay_ms`          | 1,000 (1s)                | Starting delay for the backoff phase                            |
| `maxDelayMs` / `max_delay_ms`                  | 30,000 (30s)              | Maximum delay between polls                                     |
| `backoffMultiplier` / `backoff_multiplier`     | 1.15                      | Multiplier applied to each successive backoff delay             |

## Webhooks

For event-driven architectures or very long-running processes (especially workflows), webhooks are the recommended approach. Instead of polling, Extend sends an HTTP request to your server when a run completes.

**When to use webhooks over polling:**

* **High volume** — you're processing hundreds or thousands of files and don't want to keep processes alive waiting for results
* **Long-running workflows** — complex workflows can take minutes to hours
* **Cost efficiency** — with polling, the calling process must stay alive for the entire duration of the run, which can be expensive at scale
* **Event-driven architectures** — you want to react to completions asynchronously in your backend

Every run type emits webhook events for completion and failure (e.g., `extract_run.processed`, `extract_run.failed`). Workflow runs also emit events for `needs_review`, `rejected`, and `cancelled` states.

For setup instructions, event types, and signature verification, see the [Webhooks documentation](/webhooks/configuration).

## Choosing the Right Approach

| Approach           | Best For                                                      | Trade-offs                                                               |
| ------------------ | ------------------------------------------------------------- | ------------------------------------------------------------------------ |
| **Sync endpoints** | Testing, onboarding, predictable document sizes or low volume | 5-min timeout, blocks the caller, not available for workflows            |
| **SDK polling**    | Production with moderate volume, when you need results inline | Process must stay alive for the duration of the run; simple to implement |
| **Webhooks**       | High volume, long-running workflows, event-driven systems     | Requires a publicly accessible endpoint; more infrastructure to set up   |

For most production integrations, we recommend starting with **SDK polling** for its simplicity, and adding **webhooks** as you scale or when processing workflows.