Overview

Classification assigns a document to exactly one of the categories you define and returns the match as structured JSON. You describe the possible document types with classifications, and Extend returns the matched type, a confidence score, and reasoning behind the decision. Use it to route incoming documents (invoices vs. bills of lading vs. purchase orders), gate downstream processing, or branch a workflow based on document type.

Classify runs Parse under the hood, parsing the file first if it hasn’t been parsed already and reusing the existing parsed output if it has.

Quick start

We’ll classify a freight invoice into one of several logistics document types. For this quick start we’ve uploaded the file here.

A freight invoice document

Grab a key from the Developers page and store it as the EXTEND_API_KEY environment variable. If you’re using an SDK, see the installation instructions.

$export EXTEND_API_KEY="your_api_key_here"

The /classify endpoint takes a file and a config with the classifications you want to choose between.

1import os
2from extend_ai import Extend
3
4client = Extend(token=os.environ["EXTEND_API_KEY"])
5
6result = client.classify(
7 file={
8 "url": "https://extend-public-files.s3.us-east-2.amazonaws.com/freight-invoice.pdf",
9 },
10 config={
11 "classifications": [
12 {
13 "id": "invoice",
14 "type": "invoice",
15 "description": "An invoice or bill requesting payment for goods or services.",
16 },
17 {
18 "id": "bill_of_lading",
19 "type": "bill_of_lading",
20 "description": "A bill of lading documenting a shipment of goods.",
21 },
22 {
23 "id": "purchase_order",
24 "type": "purchase_order",
25 "description": "A purchase order authorizing a purchase.",
26 },
27 {
28 "id": "other",
29 "type": "other",
30 "description": "Any other document type.",
31 },
32 ],
33 },
34)
35
36print(result)

Want to classify your own document? Upload it first, then pass the returned file id instead of a url (reusing the same config).

1with open("freight_invoice.pdf", "rb") as f:
2 uploaded = client.files.upload(file=f)
3
4result = client.classify(file={"id": uploaded.id}, config=config)

Example response

After you run the code snippet above, you’ll see a response like this. Extend parses the document, picks the best-matching category, and returns an output with the matched classification, a confidence score, and insights explaining the decision.

1{
2 "object": "classify_run",
3 "id": "clr_Xj8mK2pL9nR4vT7qY5wZ",
4 "status": "PROCESSED",
5 "output": {
6 "id": "invoice",
7 "type": "invoice",
8 "confidence": 0.97,
9 "insights": [
10 {
11 "type": "reasoning",
12 "content": "The document is titled \"Freight Invoice\" and lists line items, amounts, and payment terms."
13 }
14 ]
15 }
16}

Key fields

FieldWhat it contains
output.idThe id of the matched classification, as defined in your classifications. Branch your logic on this — it’s stable.
output.typeThe type of the matched classification.
output.confidenceThe model’s confidence in the match, from 0 to 1.
output.insightsReasoning behind the classification decision.

For full request/response details, see the Create Classify Run API reference.

Use the output

Branch your logic on the returned id, and use confidence to decide when to route a document to manual review. Match on id rather than type: the id is a stable identifier you control, while type and description are part of the prompt that steers the classification decision and may change as you tune accuracy.

1output = result.output
2
3if output.confidence < 0.8:
4 print(f"Low confidence ({output.confidence}) — route to review")
5elif output.id == "invoice":
6 print("Send to the invoice extractor")

For the full output shape and shared types, see Response Format.

Sync vs async

The example above calls the synchronous /classify endpoint. We also have an asynchronous /classify_runs endpoint that should be used for large files and high volume use cases.

See Async Processing for the full comparison, polling options, and webhook setup.

Save it as a processor

The quick start runs with an inline config, which is perfect for getting started. To reuse a configuration across runs — and to version it, measure its accuracy, and optimize it — save it as a classifier, a kind of processor. Processors are the saved entities you iterate on in the dashboard, run evaluation sets against, and improve with Composer.

Configuration

The quick start sends just file and config.classifications. To control how classification runs, pass more options inside config. Here are the most commonly used ones; for the full reference, see Configuration.

Classifications

The classifications array is the heart of every classifier — the set of categories the model chooses from. Each entry needs a unique id, a type returned in the output, and a description (your biggest lever on accuracy). At least one classification must have the type "other" as a catch-all.

1{
2 "config": {
3 "classifications": [
4 { "id": "invoice", "type": "invoice", "description": "An invoice or bill requesting payment." },
5 { "id": "other", "type": "other", "description": "Any other document type." }
6 ]
7 }
8}

Base processor

Choose the processor based on your accuracy and latency needs.

1{ "config": { "baseProcessor": "classification_performance" } }
ProcessorWhen to use
classification_performanceHighest accuracy (default).
classification_lightFaster, cheaper classification.

Classification rules

Steer the model with plain-language rules — useful for disambiguating categories that look similar.

1{
2 "config": {
3 "classificationRules": "When differentiating invoices from purchase orders, the most important signal is whether payment is being requested."
4 }
5}

Parse config

Because Classify runs Parse under the hood, you can tune how the document is parsed before classification with parseConfig.

1{
2 "config": {
3 "parseConfig": {
4 "target": "markdown",
5 "chunkingStrategy": { "type": "page" }
6 }
7 }
8}

For every option, including advanced options like context, multimodal processing, and memory, see the Configuration reference.


Next steps