Getting Started

Authentication

The Extend API uses Bearer token authentication for all API requests. You’ll need to use your API token when instantiating the client if you’re using the SDK or in the Authorization header of each request if you’re using the API directly.

Example Usage

POST

/workflow_runs

1 curl -X POST https://api.extend.ai/workflow_runs \
2      -H "x-extend-api-version: 2025-04-21" \
3      -H "Authorization: Bearer <token>" \
4      -H "Content-Type: application/json" \
5      -d '{
6   "workflowId": "workflow_id_here"
7 }'

Installing the Extend SDK

In the following getting started guide we will provide sample code using both cURL requests and our Typescript and Python SDKs. If you wish to follow along with Typescript or Python, you should install the SDKs as follows:

Typescript:

$ npm install extend-ai

Python:

$ pip install extend-ai

Creating an Extractor Processor

Learn how to use the Extend API to extract data from documents.

Extractors are processors that pull out structured data from documents. To create an extractor, we will use the Create Processor endpoint.

Extractors take in a schema, which describes the shape of the data to extract. For this example, we will create an extractor that can pull out an invoice number and table of line items from this sample invoice document.

Execute the following code in your preferred environment and language to create your first extractor. Be sure to replace YOUR_API_KEY_HERE with your actual API key.

1 curl -X POST "https://api.extend.ai/processors" \
2   -H "Authorization: Bearer YOUR_API_KEY_HERE" \
3   -H "X-Extend-Api-Version: 2025-04-21" \
4   -H "Content-Type: application/json" \
5   -d '{
6   "name": "My First Processor",
7   "type": "EXTRACT",
8   "config": {
9     "type": "EXTRACT",
10     "schema": {
11       "type": "object",
12       "properties": {
13         "invoice_number": {
14           "type": ["string", "null"],
15           "description": "The invoice number from the document"
16         },
17         "line_items": {
18           "type": "array",
19           "description": "Table of items from the invoice",
20           "items": {
21             "type": "object",
22             "properties": {
23               "item_name": {
24                 "type": ["string", "null"],
25                 "description": "Name of the item"
26               },
27               "price_per_unit": {
28                 "type": ["number", "null"],
29                 "description": "Price per unit"
30               },
31               "quantity": {
32                 "type": ["number", "null"],
33                 "description": "Quantity in units"
34               },
35               "subtotal": {
36                 "type": ["number", "null"],
37                 "description": "Subtotal for the item"
38               }
39             },
40             "required": ["item_name", "price_per_unit", "quantity", "subtotal"]
41           }
42         }
43       },
44       "required": ["invoice_number", "line_items"]
45     }
46   }
47 }'

Once the code is executed you should see a response object logged to your console. Your returned extractor will contain a processor.id field, which uniquely references this extractor. This will be used later to run the extractor against a document.

You can view the Processors page in the Extend dashboard to see your new Extractor. You can click on the processor to view the details.

New Extractor in Extend test5

In the details view, you can see the id of the processor on the left, which matches the processor.id field in the response object of your API call. You can also click into the build tab at the top to edit the processor.

In the build tab, you can see that the schema we passed in is reflected in the UI. Continued configuration of the processor can be done either in the UI or via the API.

Let’s run the extractor against the sample document using the API. To do this, we will use the Run Processor endpoint. Be sure to replace YOUR_API_KEY_HERE with your actual API key and YOUR_PROCESSOR_ID with the processor.id from the previous response / UI.

1 curl -X POST "https://api.extend.ai/processor_runs" \
2   -H "Authorization: Bearer YOUR_API_KEY_HERE" \
3   -H "X-Extend-Api-Version: 2025-04-21" \
4   -H "Content-Type: application/json" \
5   -d '{
6   "processorId": "YOUR_PROCESSOR_ID",
7   "file": {
8     "fileUrl": "https://extend-public-files.s3.us-east-2.amazonaws.com/Invoice+20193059.pdf"
9   }
10 }'

Once the above code is executed you should see a response object logged to your console. Your returned object will contain a processor_run.id field, which uniquely references this execution of the extractor against a document. The run processor endpoint is asynchronous, so you won’t see the extracted results immediately in the response.

If you navigate to the history tab of your processor in the UI, you can see the status of the run. The “run ID” should match the processor_run.id field in the response object. You can click into the run to see the extracted details.

The extracted details:

Finally, let’s call the Get Processor Run endpoint to fetch the status and results of the processor run via the API.

1 curl -X GET "https://api.extend.ai/processor_runs/YOUR_PROCESSOR_RUN_ID" \
2   -H "Authorization: Bearer YOUR_API_KEY_HERE" \
3   -H "X-Extend-Api-Version: 2025-04-21"

Once the above code is executed, you should see a response object logged with the full details of the extracted data.

Next Steps

View our Product Documentation to learn more about orchestrating processors together, and evaluating performance of your processors.
Try out our SDKs in multiple languages