For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Book a demoLog in
DocumentationAPI ReferenceModel VersioningChangelog
DocumentationAPI ReferenceModel VersioningChangelog
    • Studio
    • Support
    • Benchmarks
    • Status
  • Getting Started
    • Overview
    • API Quickstart
    • Dashboard Quickstart
    • Agent Quickstart
  • Dev Tools
    • SDKs
    • CLI
  • Capabilities
      • Overview
      • Configuration
      • Response Format
      • Schema
      • Confidence Scores
      • Review Agent
LogoLogo
Book a demoLog in
On this page
  • Response structure
  • Top-level fields
  • Output: value and metadata
  • Metadata entry fields
  • Accessing fields and metadata
  • Confidence scores
  • Citations
  • Citation fields
  • Coordinate system
  • Rendering citations in PDF viewers
  • Insights
  • Output Examples
  • Basic fields
  • Nested structures
CapabilitiesExtraction

Response Format

Was this page helpful?
Previous

Field Names and Prompt Crafting

Next
Built with

Extract turns documents into structured, machine-readable data you can trust and trace. Every run gives you the values your schema defined, plus per-field confidence scores and citations that point back to the exact spot on the page. This page explains every field in the response.


Response structure

A completed extract run looks like this (truncated to a few fields for brevity). The extracted data lives in output.value; the per-field details live in output.metadata, keyed by the same field paths.

1{
2 "object": "extract_run",
3 "id": "exr_3f1j6I1gsw5k96xFiCnkM",
4 "file": {
5 "object": "file",
6 "id": "file_GzKUy0VDhHscv7tweODYb",
7 "name": "bill_of_lading.pdf"
8 },
9 "status": "PROCESSED",
10 "output": {
11 "value": {
12 "load_number": "ABC-10025521",
13 "shipper_name": "Acme Manufacturing Co.",
14 "ship_date": "2026-03-14"
15 },
16 "metadata": {
17 "load_number": {
18 "logprobsConfidence": 1,
19 "ocrConfidence": 0.99,
20 "citations": [
21 {
22 "page": { "number": 1, "width": 612, "height": 792 },
23 "polygon": [
24 { "x": 56.8, "y": 35.2 },
25 { "x": 162.2, "y": 35.2 },
26 { "x": 162.2, "y": 48.1 },
27 { "x": 56.8, "y": 48.1 }
28 ],
29 "referenceText": "Load No. ABC-10025521"
30 }
31 ]
32 },
33 "shipper_name": { "logprobsConfidence": 0.98, "ocrConfidence": 0.97, "citations": [ /* ... */ ] },
34 "ship_date": { "logprobsConfidence": 0.99, "ocrConfidence": 0.98, "citations": [ /* ... */ ] }
35 }
36 },
37 "reviewed": false,
38 "edited": false,
39 "usage": { "credits": 2 }
40}

Top-level fields

FieldTypeDescription
objectstringAlways "extract_run".
idstringUnique identifier for the run (e.g. exr_...). Use it to fetch results later.
fileobjectThe processed file (id, name). Reusable as input to other endpoints.
statusstringPENDING, PROCESSING, PROCESSED, or FAILED.
outputobject | nullThe extracted data. Present when status is PROCESSED. Contains value and metadata.
reviewed / editedbooleanWhether a human reviewed the run, and whether they changed any values. When reviewed is true, initialOutput and reviewedOutput are also populated.
configobjectThe full configuration used, including defaults that were applied.
parseRunIdstring | nullThe ID of the parse run used for this extract run.
usageobjectCredits consumed (usage.credits).
dashboardUrlstringLink to view the run in the Extend dashboard.
failureReason / failureMessagestring | nullMachine-readable code and human-readable message. Present when status is FAILED.

Output: value and metadata

output has two halves that share the same field paths:

  • value — the data extracted from the document, conforming to the JSON Schema you defined in the config.
  • metadata — per-field details: confidence scores, citations, and insights.

The metadata object uses keys that mirror the structure of value with a path-like notation, so you can pinpoint the details for any field — including those nested in objects or arrays. If value.line_items[0].description exists, its metadata lives under the key "line_items[0].description" in metadata.

1{
2 "value": {
3 "invoice_number": "INV-123",
4 "line_items": [{ "description": "Item A", "quantity": 2 }]
5 },
6 "metadata": {
7 "invoice_number": { "logprobsConfidence": 1, "ocrConfidence": 0.99 },
8 "line_items": { "logprobsConfidence": 0.98, "ocrConfidence": 0.98 },
9 "line_items[0]": { "logprobsConfidence": 0.98, "ocrConfidence": 0.98 },
10 "line_items[0].description": { "logprobsConfidence": 1, "ocrConfidence": 0.95 },
11 "line_items[0].quantity": { "logprobsConfidence": 1, "ocrConfidence": 0.98 }
12 }
13}

Metadata entry fields

Each entry in metadata describes one field path:

FieldTypeDescription
logprobsConfidencenumber | nullModel confidence from token probabilities (0–1). Being phased out — null on extraction_light and on extraction_performance 4.6.0+. See Confidence scores.
ocrConfidencenumber | nullOCR confidence for the underlying text (0–1), or null when word-level confidence is unavailable.
reviewAgentScoreinteger | nullA 1–5 score from the Review Agent when it is enabled (5 = high confidence, 1 = significant problems). null otherwise.
citationsarrayBounding-box references back to the source document. Present when citationsEnabled is on. See Citations.
insightsarrayModel reasoning and review notes. See Insights.

Accessing fields and metadata

Read your data straight off value, and look up the matching metadata entry by its path-like key.

Python
TypeScript
Java
Go
1value = output["value"]
2metadata = output["metadata"]
3
4# Root-level field and its metadata
5invoice_number = value.get("invoice_number")
6invoice_number_meta = metadata.get("invoice_number")
7
8# Loop through an array and read per-item metadata
9for i, line_item in enumerate(value.get("line_items", [])):
10 item_meta = metadata.get(f"line_items[{i}]")
11 description_meta = metadata.get(f"line_items[{i}].description")

Confidence scores

A confidence score tells you how confident the model is in an extracted value, as a number between 0 and 1 (closer to 1 is more confident). Each metadata entry can carry two scores:

FieldDescription
logprobsConfidenceFrom the language model, based on token probabilities during generation.
ocrConfidenceFrom the OCR system, about how reliably the underlying text was recognized.

Each metadata entry holds the scores for its field, keyed by the field’s path:

1{
2 "metadata": {
3 "invoice_number": { "ocrConfidence": 0.99, "logprobsConfidence": 1.0 }
4 }
5}

For arrays, confidence is reported at every level — the array as a whole (line_items), each item (line_items[0]), and each property within an item (line_items[0].description):

1{
2 "metadata": {
3 "line_items": { "ocrConfidence": 0.98 },
4 "line_items[0]": { "ocrConfidence": 0.98 },
5 "line_items[0].description": { "ocrConfidence": 0.95 },
6 "line_items[1].description": { "ocrConfidence": 0.92 }
7 }
8}

logprobsConfidence is being phased out: extraction_light has never returned it, and extraction_performance version 4.6.0 and later return null. Prefer ocrConfidence and the Review Agent scoring system. See Extraction Performance versions.

For routing on confidence in Workflows, threshold and review patterns, and accuracy best practices, see Confidence Scores.


Citations

Citations give you a bounding-box reference and the source text for each extracted field, so you can highlight where a value came from. Enable them by setting citationsEnabled: true in the extraction config.

Generating robust citations uses an additional citation-focused model, which adds a moderate increase in latency.

Citations are returned inside each field’s metadata entry:

1{
2 "value": { "invoice_number": "US-001" },
3 "metadata": {
4 "invoice_number": {
5 "citations": [
6 {
7 "page": { "number": 1, "width": 612, "height": 792 },
8 "polygon": [
9 { "x": 612, "y": 231 },
10 { "x": 706, "y": 231 },
11 { "x": 706, "y": 259 },
12 { "x": 612, "y": 259 }
13 ],
14 "referenceText": "US-001"
15 }
16 ],
17 "ocrConfidence": 0.988,
18 "logprobsConfidence": 1
19 }
20 }
21}

Citation fields

FieldTypeDescription
page.numbernumberPage the citation was found on (starts at 1).
page.width / page.heightnumberPage dimensions, in points. Use them to normalize polygon coordinates.
polygonarrayOutline of the cited region as { x, y } points. Present only when citationsEnabled is on.
referenceTextstring | nullThe source text that backs the value. Present only when citationsEnabled is on.

Coordinate system

polygon points share the page’s coordinate space, with the origin at the top-left of the page: x increases to the right and y increases downward, both measured in points. Each citation reports the page’s own dimensions at page.width and page.height, so you can divide by them to express any position as a fraction of the page.

(0, 0) (pageWidth, 0)
┌────────────────────────────────────────────┐
│ (left, top) │
│ ●───────────────────────┐ │
│ │ │ │
│ │ citation │ │
│ │ │ │
│ └───────────────────────● │
│ (right, bottom) │
│ │
└────────────────────────────────────────────┘
(0, pageHeight) (pageWidth, pageHeight)

Rendering citations in PDF viewers

Reduce the polygon to its bounding extent and normalize against the page dimensions to produce a highlight in your viewer’s coordinate system. Here’s an example that produces a percentage-based highlight area for react-pdf-viewer or a similar library:

Python
TypeScript
Java
Go
1def citation_to_highlight(citation):
2 polygon = citation.get("polygon")
3 if not polygon:
4 raise ValueError("Citation does not have a polygon")
5
6 page = citation["page"]
7 xs = [p["x"] for p in polygon]
8 ys = [p["y"] for p in polygon]
9 left, right = min(xs), max(xs)
10 top, bottom = min(ys), max(ys)
11
12 return {
13 # Normalized to percentages of the page
14 "left": (left / page["width"]) * 100,
15 "top": (top / page["height"]) * 100,
16 "width": ((right - left) / page["width"]) * 100,
17 "height": ((bottom - top) / page["height"]) * 100,
18 "page_index": page["number"] - 1, # 1-based to 0-based
19 }

Insights

The insights array is a shared channel that explains the model’s decisions. It carries reasoning entries when reasoning insights are enabled, and the Review Agent can add notes when it is enabled.

1{
2 "metadata": {
3 "amount": {
4 "insights": [
5 {
6 "type": "reasoning",
7 "content": "The total is shown as '$15,735.10' in the table summary and bottom right. The '$' and US address indicate USD."
8 }
9 ]
10 }
11 }
12}

Each insight has a type and a content string.


Output Examples

Basic fields

1{
2 "value": {
3 "amount": { "amount": 15735.1, "iso_4217_currency_code": "USD" },
4 "invoice_number": "36995"
5 },
6 "metadata": {
7 "amount": {
8 "insights": [
9 { "type": "reasoning", "content": "The total amount is shown as '$15,735.1' in the table summary and the bottom right. The '$' and the US address indicate USD." }
10 ],
11 "citations": [
12 {
13 "page": { "number": 1, "width": 612, "height": 792 },
14 "polygon": [
15 { "x": 430.164, "y": 722.772 },
16 { "x": 467.272, "y": 722.829 },
17 { "x": 467.258, "y": 731.635 },
18 { "x": 430.149, "y": 731.577 }
19 ],
20 "referenceText": "TOTAL $15,735.1"
21 }
22 ],
23 "ocrConfidence": 0.992,
24 "logprobsConfidence": 1
25 },
26 "invoice_number": {
27 "citations": [
28 {
29 "page": { "number": 1, "width": 612, "height": 792 },
30 "polygon": [
31 { "x": 459.8, "y": 88.4 },
32 { "x": 545.2, "y": 88.4 },
33 { "x": 545.2, "y": 101.6 },
34 { "x": 459.8, "y": 101.6 }
35 ],
36 "referenceText": "Invoice #36995"
37 }
38 ],
39 "ocrConfidence": 0.986,
40 "logprobsConfidence": 1
41 }
42 }
43}

Nested structures

1{
2 "value": {
3 "line_items": [
4 { "item": "Widget A", "quantity": 5, "price": { "amount": 10.0, "iso_4217_currency_code": "USD" } },
5 { "item": "Widget B", "quantity": 2, "price": { "amount": 15.0, "iso_4217_currency_code": "USD" } }
6 ],
7 "signature_block": { "printed_name": "John Smith", "is_signed": true }
8 },
9 "metadata": {
10 "line_items": { "logprobsConfidence": 0.96 },
11 "line_items[0]": {
12 "logprobsConfidence": 0.96,
13 "citations": [
14 {
15 "page": { "number": 1, "width": 612, "height": 792 },
16 "polygon": [
17 { "x": 61.2, "y": 318.7 },
18 { "x": 550.8, "y": 318.7 },
19 { "x": 550.8, "y": 336.1 },
20 { "x": 61.2, "y": 336.1 }
21 ],
22 "referenceText": "Widget A 5 $10.00"
23 }
24 ]
25 },
26 "line_items[0].item": { "logprobsConfidence": 0.99 },
27 "line_items[0].price.amount": { "logprobsConfidence": 0.98 },
28 "signature_block": {
29 "logprobsConfidence": 0.99,
30 "citations": [
31 {
32 "page": { "number": 1, "width": 612, "height": 792 },
33 "polygon": [
34 { "x": 72.4, "y": 690.5 },
35 { "x": 198.6, "y": 690.5 },
36 { "x": 198.6, "y": 712.9 },
37 { "x": 72.4, "y": 712.9 }
38 ],
39 "referenceText": "John Smith"
40 }
41 ]
42 }
43 }
44}