For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
GuidesAPI ReferenceChangelogModel Versioning
GuidesAPI ReferenceChangelogModel Versioning
    • Getting Started
    • Authentication
    • API Versioning
    • SDKs
    • Deployments
    • Error Codes
    • Async Processing
  • Endpoints
      • POSTExtract File (Sync)
      • POSTExtract File (Async)
      • GETGet Extract Run
      • POSTCancel Extract Run
      • DELDelete Extract Run
      • GETList Extract Runs
      • POSTCreate Extractor
      • GETGet Extractor
      • POSTUpdate Extractor
      • GETList Extractors
      • POSTCreate Extractor Version
      • GETGet Extractor Version
      • GETList Extractor Versions
  • Webhook Events
LogoLogo
EndpointsExtract

Create Extractor

POST
/extractors
POST
/extractors
1import { ExtendClient } from "extend-ai";
2
3const client = new ExtendClient({ token: "YOUR_TOKEN" });
4await client.extractors.create({
5 name: "Invoice Extractor",
6 generate: {
7 files: [{
8 url: "https://example.com/sample-invoice.pdf"
9 }],
10 instructions: "US tax invoice with line items, vendor details, and total amount"
11 }
12});
1{
2 "object": "extractor",
3 "id": "ex_Xj8mK2pL9nR4vT7qY5wZ",
4 "name": "Invoice Extractor",
5 "createdAt": "2024-03-21T16:45:00Z",
6 "updatedAt": "2024-03-21T16:45:00Z",
7 "draftVersion": {
8 "object": "extractor_version",
9 "id": "exv_xK9mLPqRtN3vS8wF5hB2cQ",
10 "description": "Updated extraction fields for new invoice format",
11 "version": "draft",
12 "config": {
13 "baseProcessor": "extraction_performance",
14 "baseVersion": "string",
15 "extractionRules": "string",
16 "schema": {},
17 "advancedOptions": {
18 "modelReasoningInsightsEnabled": true,
19 "advancedMultimodalEnabled": true,
20 "citationsEnabled": true,
21 "citationMode": "line",
22 "arrayCitationStrategy": "item",
23 "arrayStrategy": {
24 "type": "large_array_heuristics"
25 },
26 "chunkingOptions": {
27 "chunkingStrategy": "standard",
28 "pageChunkSize": 1,
29 "chunkSelectionStrategy": "intelligent",
30 "customSemanticChunkingRules": "string"
31 },
32 "excelSheetRanges": [
33 {
34 "start": 1,
35 "end": 1
36 }
37 ],
38 "excelSheetSelectionStrategy": "intelligent",
39 "pageRanges": [
40 {
41 "start": 1,
42 "end": 10
43 },
44 {
45 "start": 20,
46 "end": 30
47 }
48 ],
49 "reviewAgent": {
50 "enabled": true
51 },
52 "currentDateEnabled": true
53 },
54 "parseConfig": {
55 "target": "markdown",
56 "chunkingStrategy": {
57 "type": "page",
58 "options": {
59 "minCharacters": 500,
60 "maxCharacters": 10000
61 }
62 },
63 "engine": "parse_performance",
64 "engineVersion": "latest",
65 "blockOptions": {
66 "figures": {
67 "enabled": true,
68 "figureImageClippingEnabled": true,
69 "advancedChartExtractionEnabled": false
70 },
71 "tables": {
72 "enabled": true,
73 "targetFormat": "html",
74 "tableHeaderContinuationEnabled": false,
75 "cellBlocksEnabled": false,
76 "agentic": {
77 "enabled": false,
78 "customInstructions": "string"
79 }
80 },
81 "text": {
82 "signatureDetectionEnabled": false,
83 "agentic": {
84 "enabled": false,
85 "customInstructions": "string"
86 }
87 },
88 "keyValue": {
89 "blankFieldFormattingEnabled": false
90 },
91 "barcodes": {
92 "imageClippingEnabled": false,
93 "readingEnabled": false
94 },
95 "formulas": {
96 "enabled": false
97 }
98 },
99 "advancedOptions": {
100 "pageRotationEnabled": true,
101 "pageRanges": [
102 {
103 "start": 1,
104 "end": 10
105 },
106 {
107 "start": 20,
108 "end": 30
109 }
110 ],
111 "excelParsingMode": "basic",
112 "excelSkipHiddenContent": false,
113 "excelUseRawCellValues": false,
114 "excelSkipCalculation": true,
115 "verticalGroupingThreshold": 1,
116 "returnOcr": {
117 "words": false
118 },
119 "alwaysConvertToPdf": false,
120 "enrichmentFormat": "xml",
121 "imageConversionQuality": "medium",
122 "formattingDetection": [
123 {
124 "type": "change_tracking"
125 }
126 ]
127 }
128 }
129 },
130 "extractorId": "ex_Xj8mK2pL9nR4vT7qY5wZ",
131 "createdAt": "2024-03-21T16:45:00Z"
132 }
133}

Create a new extractor.

You can optionally provide a generate object to automatically generate an extraction schema from sample documents using AI. generate is mutually exclusive with config and cloneExtractorId.

Was this page helpful?
Previous

Get Extractor

Next
Built with

Authentication

AuthorizationBearer

Bearer authentication of the form Bearer <token>, where token is your auth token.

Headers

x-extend-api-version"2026-02-09"Optional
API version to use for the request. If you're using an SDK, you can ignore this parameter. If you are not using an SDK and do not specify a version, you will either receive a `400 Bad Request` or be set to a previous legacy version. See [API Versioning](https://docs.extend.ai/2026-02-09/developers/api-versioning) for more details.

Request

This endpoint expects an object.
namestringRequired
The name of the extractor.
cloneExtractorIdstringOptional

The ID of an existing extractor to clone. If provided, the new extractor will be created with the same config as the extractor with this ID. Cannot be provided together with config or generate.

Example: "ex_BMdfq_yWM3sT-ZzvCnA3f"

configobjectOptional

The configuration for the extractor. Cannot be provided together with cloneExtractorId or generate.

generateobjectOptional

If provided, an extraction schema is automatically generated from the supplied sample documents and applied to the extractor’s draft. The response includes the extractor with the generated schema already in place.

Cannot be provided together with config or cloneExtractorId.

Response

Extractor created successfully
objectenum

The type of object. Will always be "extractor".

Allowed values:
idstring

The ID of the extractor.

Example: "ex_Xj8mK2pL9nR4vT7qY5wZ"

namestring

The name of the extractor.

Example: "Invoice Extractor"

createdAtstringformat: "date-time"

The time (in UTC) at which the object was created. Will follow the RFC 3339 format.

Example: "2024-03-21T16:45:00Z"

updatedAtstringformat: "date-time"

The time (in UTC) at which the object was last updated. Will follow the RFC 3339 format.

Example: "2024-03-21T16:45:00Z"

draftVersionobject
The draft version of the extractor. This is the editable version in the Extend dashboard.

Errors

400
Bad Request Error
401
Unauthorized Error
402
Payment Required Error
403
Forbidden Error
404
Not Found Error
422
Unprocessable Entity Error
429
Too Many Requests Error
500
Internal Server Error

API version to use for the request. If you’re using an SDK, you can ignore this parameter. If you are not using an SDK and do not specify a version, you will either receive a 400 Bad Request or be set to a previous legacy version. See API Versioning for more details.