Quick Start (5 minutes)

Want to see how Extend works with your own documents? This guide will get you up and running in under 5 minutes to test extraction on your files, using the API.

Optional: Use a sample document

If you don’t have a document handy, use this Sample Invoice. We’ll reference it throughout the steps.

Step 1: Import / Create Your Schema

Open the Extend Studio. In the Extractor card, click Create new and choose one of:

  • Import JSON Schema: If you have an existing JSON Schema, you can import that configuration here.
  • Generate Extractor: If you have a document (e.g., the sample invoice above), you can generate a starter schema from it.

Create Extractor options

Step 2: Copy the Code

Now that we have created our Extractor, we will run a document against it, via the API.

We will copy a pre-generated code snippet to execute your Extractor against a file you provide:

In the builder tab of the Extractor, click View code in the top-right. In the modal, pick your preferred language tab and copy the snippet.

Replace YOUR_API_KEY with your actual key, available in the Developers page.

If you have unsaved changes (there will be a warning), first click Save at the bottom right, so the code snippet reflects your latest config.

If you’re using the JavaScript or Python SDK snippets, install the SDKs first from our SDKs page.

We have provided example code snippets below, which look similar to what you will find on the UI using the “View code” button.

API code example

1import { ExtendClient } from "extend-ai";
2
3const client = new ExtendClient({ token: "YOUR_API_KEY" });
4
5const response = await client.processorRun.create({
6 processorId: "<YOUR_PROCESSOR_ID>", // This can be retrieved in the "Overview" tab of your newly created Extractor, in the UI
7 file: {
8 fileUrl: "<YOUR_FILE_URL>", // You may use our sample invoice URL. Only keep one of fileUrl or fileId.
9 fileId: "<YOUR_FILE_ID>" // If you uploaded a file into the Extend system, use this. Only keep one of fileId or fileUrl.
10 },
11 sync: true,
12 config: {
13 // Put your config here, or copy our sample invoice config below
14 }
15});
16
17console.log("Processor run:", response)

If you are following along with the sample invoice, your generated config should look something like this:

1"config": {
2 "type": "EXTRACT",
3 "baseProcessor": "extraction_performance",
4 "baseVersion": "4.1.1",
5 "schema": {
6 "type": "object",
7 "required": [
8 "bill_to",
9 "ship_to",
10 "due_date",
11 "line_items",
12 "tax_amount",
13 "balance_due",
14 "vendor_name",
15 "invoice_date",
16 "total_amount",
17 "invoice_number",
18 "subtotal_amount"
19 ],
20 "properties": {
21 "bill_to": {
22 "type": [
23 "string",
24 "null"
25 ],
26 "description": "The name and address of the individual or organization being billed. This is the recipient of the invoice and payment request. May appear under labels such as 'Bill To', 'Billed To', or similar."
27 },
28 "ship_to": {
29 "type": [
30 "string",
31 "null"
32 ],
33 "description": "The name and address where goods or services are to be delivered, if different from the billing address. May be labeled as 'Ship To', 'Shipping Address', or similar."
34 },
35 "due_date": {
36 "type": [
37 "string",
38 "null"
39 ],
40 "description": "The date by which payment for this invoice is expected. May be explicitly labeled as 'Due Date', 'Payment Due', or may require calculation based on payment terms. The key is identifying when the payment obligation becomes due.",
41 "extend:type": "date"
42 },
43 "line_items": {
44 "type": "array",
45 "items": {
46 "type": "object",
47 "required": [
48 "amount",
49 "quantity",
50 "unit_price",
51 "description"
52 ],
53 "properties": {
54 "amount": {
55 "type": [
56 "number",
57 "null"
58 ],
59 "description": "The total cost for this line item, typically calculated as quantity multiplied by unit price. May also be labeled as 'Amount', 'Line Total', or similar."
60 },
61 "quantity": {
62 "type": [
63 "number",
64 "null"
65 ],
66 "description": "The number of units, items, or hours billed for this line item. Can be a whole number or decimal."
67 },
68 "unit_price": {
69 "type": [
70 "number",
71 "null"
72 ],
73 "description": "The price per single unit of this item before quantity multiplication. Represents the base rate for one unit."
74 },
75 "description": {
76 "type": [
77 "string",
78 "null"
79 ],
80 "description": "A description of the product or service provided in this line item. May include item names, service details, or product codes."
81 }
82 },
83 "additionalProperties": false
84 },
85 "description": "The individual products, services, or charges that make up this invoice. Each item typically includes a description, quantity, unit price, and total amount. Format may vary from tables to lists."
86 },
87 "tax_amount": {
88 "type": "object",
89 "required": [
90 "amount",
91 "iso_4217_currency_code"
92 ],
93 "properties": {
94 "amount": {
95 "type": [
96 "number",
97 "null"
98 ]
99 },
100 "iso_4217_currency_code": {
101 "type": [
102 "string",
103 "null"
104 ]
105 }
106 },
107 "description": "The total tax applied to this invoice. May be labeled as 'Tax', 'Sales Tax', or include a percentage. Represents the sum of all taxes charged.",
108 "extend:type": "currency",
109 "additionalProperties": false
110 },
111 "balance_due": {
112 "type": "object",
113 "required": [
114 "amount",
115 "iso_4217_currency_code"
116 ],
117 "properties": {
118 "amount": {
119 "type": [
120 "number",
121 "null"
122 ]
123 },
124 "iso_4217_currency_code": {
125 "type": [
126 "string",
127 "null"
128 ]
129 }
130 },
131 "description": "The outstanding amount that remains to be paid on this invoice. May be labeled as 'Balance Due', 'Amount Due', or similar. In some documents, this may be the same as the total amount if no prior payments or credits have been applied.",
132 "extend:type": "currency",
133 "additionalProperties": false
134 },
135 "vendor_name": {
136 "type": [
137 "string",
138 "null"
139 ],
140 "description": "The name of the company or individual issuing the invoice. This is the party requesting payment, often found at the top or in a header section."
141 },
142 "invoice_date": {
143 "type": [
144 "string",
145 "null"
146 ],
147 "description": "The date when this invoice was created or issued. This is the official date for accounting and payment term calculations. May be labeled as 'Date', 'Invoice Date', or similar, and can appear in various formats.",
148 "extend:type": "date"
149 },
150 "total_amount": {
151 "type": "object",
152 "required": [
153 "amount",
154 "iso_4217_currency_code"
155 ],
156 "properties": {
157 "amount": {
158 "type": [
159 "number",
160 "null"
161 ]
162 },
163 "iso_4217_currency_code": {
164 "type": [
165 "string",
166 "null"
167 ]
168 }
169 },
170 "description": "The final amount owed by the customer, including all items, taxes, and adjustments. This is the complete payment obligation. May be labeled as 'Total', 'Amount Due', 'Balance Due', or similar, and is often the most prominent monetary value on the document.",
171 "extend:type": "currency",
172 "additionalProperties": false
173 },
174 "invoice_number": {
175 "type": [
176 "string",
177 "null"
178 ],
179 "description": "The unique identifier assigned to this invoice or billing document. This is the primary reference number for the transaction and may include numbers, letters, or special characters. Common labels include 'Invoice #', 'Bill Number', or may appear as a prominent number near the top or center of the document."
180 },
181 "subtotal_amount": {
182 "type": "object",
183 "required": [
184 "amount",
185 "iso_4217_currency_code"
186 ],
187 "properties": {
188 "amount": {
189 "type": [
190 "number",
191 "null"
192 ]
193 },
194 "iso_4217_currency_code": {
195 "type": [
196 "string",
197 "null"
198 ]
199 }
200 },
201 "description": "The sum of all line item amounts before taxes, discounts, or additional charges. May be labeled as 'Subtotal' or similar. Represents the total of goods and services prior to adjustments.",
202 "extend:type": "currency",
203 "additionalProperties": false
204 }
205 },
206 "additionalProperties": false
207 },
208 "advancedOptions": {
209 "citationsEnabled": true,
210 "chunkingOptions": {},
211 "advancedFigureParsingEnabled": true
212 }
213 }

Step 3: Upload Your File

If you do not have a file URL for your document already, and are not following along with the sample invoice document, you can upload your document into Extend using the upload file endpoint. We will return to you a fileId which you can then use to process your document.

1import { ExtendClient } from "extend-ai";
2import * as fs from "fs";
3const client = new ExtendClient({ token: "YOUR_TOKEN" });
4const result = await client.file.upload(fs.createReadStream("/path/to/your/file"));
5console.log(result);

Step 4: Run Processor and Get Results

Now we will run our Extractor against our document, using the code snippet from step 2.

If you have a URL for your file, or are following along with our sample invoice document, you should replace YOUR_FILE_URL with that URL, and delete the YOUR_FILE_ID option.

If you uploaded a file into Extend with the upload file endpoint, you should replace YOUR_FILE_ID with the id returned from that endpoint, and delete the YOUR_FILE_URL section.

Also, be sure to replace YOUR_API_KEY_HERE and YOUR_PROCESSOR_ID with the appropriate values.

Execute the code in your environment. You should see the extracted data in your console! To interpret the response structure, see Extractor output type.

See the below snippets which put everything together (uploading the file and running the Extractor).

1import { ExtendClient } from "extend-ai";
2import * as fs from "fs";
3
4const client = new ExtendClient({ token: "YOUR_API_KEY" });
5
6// Upload the file
7const upload = await client.file.upload(fs.createReadStream("/path/to/your/file.pdf"));
8const fileId = upload.file.id;
9
10// Run the extractor
11const response = await client.processorRun.create({
12 processorId: "<YOUR_PROCESSOR_ID>",
13 file: { fileId },
14 sync: true,
15 config: {
16 // Put your config here, or copy our sample invoice config above
17 }
18});
19
20console.log("Processor run:", response);

Next steps

  • How can I view the API endpoint to run an extractor?
  • How can I configure my extraction schema?
  • How can I set up webhooks?
  • When should I use the sync: true flag?
    • For this quick evaluation, we recommend sync: true for immediate results.
    • For production, use sync: true only when latency is predictable and under 5 minutes and documents are small. Otherwise, run asynchronously and rely on webhooks.
  • How can I assess the accuracy of Extend?
    • Use Evaluation Sets to measure performance on your own ground truth. Create a set, run your extractor across it, and review diffs and accuracy metrics.