For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Book a demoLog in
DocumentationAPI ReferenceModel VersioningChangelog
DocumentationAPI ReferenceModel VersioningChangelog
    • Studio
    • Support
    • Benchmarks
    • Status
  • Getting Started
    • Overview
    • API Quickstart
    • Dashboard Quickstart
    • Agent Quickstart
  • Dev Tools
    • SDKs
    • CLI
  • Capabilities
      • Overview
      • Configuration
      • Response Format
LogoLogo
Book a demoLog in
On this page
  • Quick start
  • Example response
  • Key fields
  • Use the output
  • Sync vs async
  • Save it as a processor
  • Configuration
  • Classifications
  • Base processor
  • Classification rules
  • Parse config
  • Next steps
CapabilitiesClassification

Overview

Was this page helpful?
Previous

Configuration

Next
Built with

Classification assigns a document to exactly one of the categories you define and returns the match as structured JSON. You describe the possible document types with classifications, and Extend returns the matched type, a confidence score, and reasoning behind the decision. Use it to route incoming documents (invoices vs. bills of lading vs. purchase orders), gate downstream processing, or branch a workflow based on document type.

Quick start

We’ll classify a freight invoice into one of several logistics document types. For this quick start we’ve uploaded the file here.

A freight invoice document

Grab a key from the Developers page and store it as the EXTEND_API_KEY environment variable. If you’re using an SDK, see the installation instructions.

$export EXTEND_API_KEY="your_api_key_here"

The /classify endpoint takes a file and a config with the classifications you want to choose between.

Python
TypeScript
Java
Go
cURL
1import os
2from extend_ai import Extend
3
4client = Extend(token=os.environ["EXTEND_API_KEY"])
5
6result = client.classify(
7 file={
8 "url": "https://extend-public-files.s3.us-east-2.amazonaws.com/freight-invoice.pdf",
9 },
10 config={
11 "classifications": [
12 {
13 "id": "invoice",
14 "type": "invoice",
15 "description": "An invoice or bill requesting payment for goods or services.",
16 },
17 {
18 "id": "bill_of_lading",
19 "type": "bill_of_lading",
20 "description": "A bill of lading documenting a shipment of goods.",
21 },
22 {
23 "id": "purchase_order",
24 "type": "purchase_order",
25 "description": "A purchase order authorizing a purchase.",
26 },
27 {
28 "id": "other",
29 "type": "other",
30 "description": "Any other document type.",
31 },
32 ],
33 },
34)
35
36print(result)

Want to classify your own document? Upload it first, then pass the returned file id instead of a url (reusing the same config).

Python
TypeScript
Java
Go
cURL
1with open("freight_invoice.pdf", "rb") as f:
2 uploaded = client.files.upload(file=f)
3
4result = client.classify(file={"id": uploaded.id}, config=config)

Example response

After you run the code snippet above, you’ll see a response like this. Extend parses the document, picks the best-matching category, and returns an output with the matched classification, a confidence score, and insights explaining the decision.

1{
2 "object": "classify_run",
3 "id": "clr_Xj8mK2pL9nR4vT7qY5wZ",
4 "status": "PROCESSED",
5 "output": {
6 "id": "invoice",
7 "type": "invoice",
8 "confidence": 0.97,
9 "insights": [
10 {
11 "type": "reasoning",
12 "content": "The document is titled \"Freight Invoice\" and lists line items, amounts, and payment terms."
13 }
14 ]
15 }
16}

Key fields

FieldWhat it contains
output.idThe id of the matched classification, as defined in your classifications. Branch your logic on this — it’s stable.
output.typeThe type of the matched classification.
output.confidenceThe model’s confidence in the match, from 0 to 1.
output.insightsReasoning behind the classification decision.

For full request/response details, see the Create Classify Run API reference.

Use the output

Branch your logic on the returned id, and use confidence to decide when to route a document to manual review. Match on id rather than type: the id is a stable identifier you control, while type and description are part of the prompt that steers the classification decision and may change as you tune accuracy.

Python
TypeScript
Java
Go
1output = result.output
2
3if output.confidence < 0.8:
4 print(f"Low confidence ({output.confidence}) — route to review")
5elif output.id == "invoice":
6 print("Send to the invoice extractor")

For the full output shape and shared types, see Response Format.

Sync vs async

The example above calls the synchronous /classify endpoint. We also have an asynchronous /classify_runs endpoint that should be used for large files and high volume use cases.

See Async Processing for the full comparison, polling options, and webhook setup.

Save it as a processor

The quick start runs with an inline config, which is perfect for getting started. To reuse a configuration across runs — and to version it, measure its accuracy, and optimize it — save it as a classifier, a kind of processor. Processors are the saved entities you iterate on in the dashboard, run evaluation sets against, and improve with Composer.

Configuration

The quick start sends just file and config.classifications. To control how classification runs, pass more options inside config. Here are the most commonly used ones; for the full reference, see Configuration.

Classifications

The classifications array is the heart of every classifier — the set of categories the model chooses from. Each entry needs a unique id, a type returned in the output, and a description (your biggest lever on accuracy). At least one classification must have the type "other" as a catch-all.

1{
2 "config": {
3 "classifications": [
4 { "id": "invoice", "type": "invoice", "description": "An invoice or bill requesting payment." },
5 { "id": "other", "type": "other", "description": "Any other document type." }
6 ]
7 }
8}

Base processor

Choose the processor based on your accuracy and latency needs.

1{ "config": { "baseProcessor": "classification_performance" } }
ProcessorWhen to use
classification_performanceHighest accuracy (default).
classification_lightFaster, cheaper classification.

Classification rules

Steer the model with plain-language rules — useful for disambiguating categories that look similar.

1{
2 "config": {
3 "classificationRules": "When differentiating invoices from purchase orders, the most important signal is whether payment is being requested."
4 }
5}

Parse config

Because Classify runs Parse under the hood, you can tune how the document is parsed before classification with parseConfig.

1{
2 "config": {
3 "parseConfig": {
4 "target": "markdown",
5 "chunkingStrategy": { "type": "page" }
6 }
7 }
8}

For every option, including advanced options like context, multimodal processing, and memory, see the Configuration reference.


Next steps

Configuration

Classifications, rules, base processor, and advanced options.

Response Format

The full shape of the classify run and output.

Workflows

Branch a document pipeline based on the classified type.

API Reference

Full request and response schema for the classify endpoint.