Extraction Response Format | Extend Documentation

Extract turns documents into structured, machine-readable data you can trust and trace. Every run gives you the values your schema defined, plus per-field confidence scores and citations that point back to the exact spot on the page. This page explains every field in the response.

Response structure

A completed extract run looks like this (truncated to a few fields for brevity). The extracted data lives in output.value; the per-field details live in output.metadata, keyed by the same field paths.

1 {
2   "object": "extract_run",
3   "id": "exr_3f1j6I1gsw5k96xFiCnkM",
4   "file": {
5     "object": "file",
6     "id": "file_GzKUy0VDhHscv7tweODYb",
7     "name": "bill_of_lading.pdf"
8   },
9   "status": "PROCESSED",
10   "output": {
11     "value": {
12       "load_number": "ABC-10025521",
13       "shipper_name": "Acme Manufacturing Co.",
14       "ship_date": "2026-03-14"
15     },
16     "metadata": {
17       "load_number": {
18         "logprobsConfidence": 1,
19         "ocrConfidence": 0.99,
20         "citations": [
21           {
22             "fileId": "file_GzKUy0VDhHscv7tweODYb",
23             "page": { "number": 1, "width": 612, "height": 792 },
24             "polygon": [
25               { "x": 56.8, "y": 35.2 },
26               { "x": 162.2, "y": 35.2 },
27               { "x": 162.2, "y": 48.1 },
28               { "x": 56.8, "y": 48.1 }
29             ],
30             "referenceText": "Load No. ABC-10025521"
31           }
32         ]
33       },
34       "shipper_name": { "logprobsConfidence": 0.98, "ocrConfidence": 0.97, "citations": [ /* ... */ ] },
35       "ship_date": { "logprobsConfidence": 0.99, "ocrConfidence": 0.98, "citations": [ /* ... */ ] }
36     }
37   },
38   "reviewed": false,
39   "edited": false,
40   "usage": { "credits": 2 }
41 }

Top-level fields

Field	Type	Description
`object`	string	Always `"extract_run"`.
`id`	string	Unique identifier for the run (e.g. `exr_...`). Use it to fetch results later.
`file`	object	The processed file (`id`, `name`). Reusable as input to other endpoints.
`status`	string	`PENDING`, `PROCESSING`, `PROCESSED`, or `FAILED`.
`output`	object \| null	The extracted data. Present when `status` is `PROCESSED`. Contains `value` and `metadata`.
`reviewed` / `edited`	boolean	Whether a human reviewed the run, and whether they changed any values. When `reviewed` is `true`, `initialOutput` and `reviewedOutput` are also populated.
`config`	object	The full configuration used, including defaults that were applied.
`parseRunId`	string \| null	The ID of the parse run used for this extract run.
`usage`	object	Credits consumed (`usage.credits`).
`dashboardUrl`	string	Link to view the run in the Extend dashboard.
`failureReason` / `failureMessage`	string \| null	Machine-readable code and human-readable message. Present when `status` is `FAILED`.

Output: value and metadata

output has two halves that share the same field paths:

value — the data extracted from the document, conforming to the JSON Schema you defined in the config.
metadata — per-field details: confidence scores, citations, and insights.

The metadata object uses keys that mirror the structure of value with a path-like notation, so you can pinpoint the details for any field — including those nested in objects or arrays. If value.line_items[0].description exists, its metadata lives under the key "line_items[0].description" in metadata.

1 {
2   "value": {
3     "invoice_number": "INV-123",
4     "line_items": [{ "description": "Item A", "quantity": 2 }]
5   },
6   "metadata": {
7     "invoice_number": { "logprobsConfidence": 1, "ocrConfidence": 0.99 },
8     "line_items": { "logprobsConfidence": 0.98, "ocrConfidence": 0.98 },
9     "line_items[0]": { "logprobsConfidence": 0.98, "ocrConfidence": 0.98 },
10     "line_items[0].description": { "logprobsConfidence": 1, "ocrConfidence": 0.95 },
11     "line_items[0].quantity": { "logprobsConfidence": 1, "ocrConfidence": 0.98 }
12   }
13 }

Metadata entry fields

Each entry in metadata describes one field path:

Field	Type	Description
`logprobsConfidence`	number \| null	Model confidence from token probabilities (0–1). Being phased out — `null` on `extraction_light` and on `extraction_performance` `4.6.0`+. See Confidence scores.
`ocrConfidence`	number \| null	OCR confidence for the underlying text (0–1), or `null` when word-level confidence is unavailable.
`reviewAgentScore`	integer \| null	A 1–5 score from the Review Agent when it is enabled (5 = high confidence, 1 = significant problems). `null` otherwise.
`citations`	array	Bounding-box references back to the source document. Present when `citationsEnabled` is on. See Citations.
`insights`	array	Model reasoning and review notes. See Insights.

Accessing fields and metadata

Read your data straight off value, and look up the matching metadata entry by its path-like key.

Python

TypeScript

Java

Go

1 value = output["value"]
2 metadata = output["metadata"]
3 
4 # Root-level field and its metadata
5 invoice_number = value.get("invoice_number")
6 invoice_number_meta = metadata.get("invoice_number")
7 
8 # Loop through an array and read per-item metadata
9 for i, line_item in enumerate(value.get("line_items", [])):
10     item_meta = metadata.get(f"line_items[{i}]")
11     description_meta = metadata.get(f"line_items[{i}].description")

Confidence scores

A confidence score tells you how confident the model is in an extracted value, as a number between 0 and 1 (closer to 1 is more confident). Each metadata entry can carry two scores:

Field	Description
`logprobsConfidence`	From the language model, based on token probabilities during generation.
`ocrConfidence`	From the OCR system, about how reliably the underlying text was recognized.

Each metadata entry holds the scores for its field, keyed by the field’s path:

1 {
2   "metadata": {
3     "invoice_number": { "ocrConfidence": 0.99, "logprobsConfidence": 1.0 }
4   }
5 }

For arrays, confidence is reported at every level — the array as a whole (line_items), each item (line_items[0]), and each property within an item (line_items[0].description):

1 {
2   "metadata": {
3     "line_items": { "ocrConfidence": 0.98 },
4     "line_items[0]": { "ocrConfidence": 0.98 },
5     "line_items[0].description": { "ocrConfidence": 0.95 },
6     "line_items[1].description": { "ocrConfidence": 0.92 }
7   }
8 }

logprobsConfidence is being phased out: extraction_light has never returned it, and extraction_performance version 4.6.0 and later return null. Prefer ocrConfidence and the Review Agent scoring system. See Extraction Performance versions.

For routing on confidence in Workflows, threshold and review patterns, and accuracy best practices, see Confidence Scores.

Citations

Citations give you a bounding-box reference and the source text for each extracted field, so you can highlight where a value came from. Enable them by setting citationsEnabled: true in the extraction config.

Generating robust citations uses an additional citation-focused model, which adds a moderate increase in latency.

Citations are returned inside each field’s metadata entry:

1 {
2   "value": { "invoice_number": "US-001" },
3   "metadata": {
4     "invoice_number": {
5       "citations": [
6         {
7           "fileId": "file_GzKUy0VDhHscv7tweODYb",
8           "page": { "number": 1, "width": 612, "height": 792 },
9           "polygon": [
10             { "x": 612, "y": 231 },
11             { "x": 706, "y": 231 },
12             { "x": 706, "y": 259 },
13             { "x": 612, "y": 259 }
14           ],
15           "referenceText": "US-001"
16         }
17       ],
18       "ocrConfidence": 0.988,
19       "logprobsConfidence": 1
20     }
21   }
22 }

Citation fields

Field	Type	Description
`fileId`	string	ID of the file the citation was found in. On single-file runs it equals the run’s `file.id`; on multifile runs it identifies which entry in the run’s `files` array the citation refers to.
`page.number`	number	Page the citation was found on (starts at 1). Page numbers are relative to the cited file (`fileId`), not the corpus.
`page.width` / `page.height`	number	Page dimensions, in points. Use them to normalize `polygon` coordinates.
`polygon`	array	Outline of the cited region as `{ x, y }` points. Present only when `citationsEnabled` is on.
`referenceText`	string \| null	The source text that backs the value. Present only when `citationsEnabled` is on.

Coordinate system

polygon points share the page’s coordinate space, with the origin at the top-left of the page: x increases to the right and y increases downward, both measured in points. Each citation reports the page’s own dimensions at page.width and page.height, so you can divide by them to express any position as a fraction of the page.

(0, 0)                                   (pageWidth, 0)
  ┌────────────────────────────────────────────┐
  │   (left, top)                              │
  │      ●───────────────────────┐             │
  │      │                       │             │
  │      │       citation        │             │
  │      │                       │             │
  │      └───────────────────────●             │
  │                          (right, bottom)   │
  │                                            │
  └────────────────────────────────────────────┘
(0, pageHeight)                  (pageWidth, pageHeight)

Rendering citations in PDF viewers

Reduce the polygon to its bounding extent and normalize against the page dimensions to produce a highlight in your viewer’s coordinate system. Here’s an example that produces a percentage-based highlight area for react-pdf-viewer or a similar library:

Python

TypeScript

Java

Go

1 def citation_to_highlight(citation):
2     polygon = citation.get("polygon")
3     if not polygon:
4         raise ValueError("Citation does not have a polygon")
5 
6     page = citation["page"]
7     xs = [p["x"] for p in polygon]
8     ys = [p["y"] for p in polygon]
9     left, right = min(xs), max(xs)
10     top, bottom = min(ys), max(ys)
11 
12     return {
13         # Normalized to percentages of the page
14         "left": (left / page["width"]) * 100,
15         "top": (top / page["height"]) * 100,
16         "width": ((right - left) / page["width"]) * 100,
17         "height": ((bottom - top) / page["height"]) * 100,
18         "page_index": page["number"] - 1,  # 1-based to 0-based
19     }

Insights

The insights array is a shared channel that explains the model’s decisions. It carries reasoning entries when reasoning insights are enabled, and the Review Agent can add notes when it is enabled.

1 {
2   "metadata": {
3     "amount": {
4       "insights": [
5         {
6           "type": "reasoning",
7           "content": "The total is shown as '$15,735.10' in the table summary and bottom right. The '$' and US address indicate USD."
8         }
9       ]
10     }
11   }
12 }

Each insight has a type and a content string.

Output Examples

Basic fields

1 {
2   "value": {
3     "amount": { "amount": 15735.1, "iso_4217_currency_code": "USD" },
4     "invoice_number": "36995"
5   },
6   "metadata": {
7     "amount": {
8       "insights": [
9         { "type": "reasoning", "content": "The total amount is shown as '$15,735.1' in the table summary and the bottom right. The '$' and the US address indicate USD." }
10       ],
11       "citations": [
12         {
13           "fileId": "file_xK9mLPqRtN3vS8wF5hB2cQ",
14           "page": { "number": 1, "width": 612, "height": 792 },
15           "polygon": [
16             { "x": 430.164, "y": 722.772 },
17             { "x": 467.272, "y": 722.829 },
18             { "x": 467.258, "y": 731.635 },
19             { "x": 430.149, "y": 731.577 }
20           ],
21           "referenceText": "TOTAL  $15,735.1"
22         }
23       ],
24       "ocrConfidence": 0.992,
25       "logprobsConfidence": 1
26     },
27     "invoice_number": {
28       "citations": [
29         {
30           "fileId": "file_xK9mLPqRtN3vS8wF5hB2cQ",
31           "page": { "number": 1, "width": 612, "height": 792 },
32           "polygon": [
33             { "x": 459.8, "y": 88.4 },
34             { "x": 545.2, "y": 88.4 },
35             { "x": 545.2, "y": 101.6 },
36             { "x": 459.8, "y": 101.6 }
37           ],
38           "referenceText": "Invoice #36995"
39         }
40       ],
41       "ocrConfidence": 0.986,
42       "logprobsConfidence": 1
43     }
44   }
45 }

Nested structures

1 {
2   "value": {
3     "line_items": [
4       { "item": "Widget A", "quantity": 5, "price": { "amount": 10.0, "iso_4217_currency_code": "USD" } },
5       { "item": "Widget B", "quantity": 2, "price": { "amount": 15.0, "iso_4217_currency_code": "USD" } }
6     ],
7     "signature_block": { "printed_name": "John Smith", "is_signed": true }
8   },
9   "metadata": {
10     "line_items": { "logprobsConfidence": 0.96 },
11     "line_items[0]": {
12       "logprobsConfidence": 0.96,
13       "citations": [
14         {
15           "fileId": "file_xK9mLPqRtN3vS8wF5hB2cQ",
16           "page": { "number": 1, "width": 612, "height": 792 },
17           "polygon": [
18             { "x": 61.2, "y": 318.7 },
19             { "x": 550.8, "y": 318.7 },
20             { "x": 550.8, "y": 336.1 },
21             { "x": 61.2, "y": 336.1 }
22           ],
23           "referenceText": "Widget A   5   $10.00"
24         }
25       ]
26     },
27     "line_items[0].item": { "logprobsConfidence": 0.99 },
28     "line_items[0].price.amount": { "logprobsConfidence": 0.98 },
29     "signature_block": {
30       "logprobsConfidence": 0.99,
31       "citations": [
32         {
33           "fileId": "file_xK9mLPqRtN3vS8wF5hB2cQ",
34           "page": { "number": 1, "width": 612, "height": 792 },
35           "polygon": [
36             { "x": 72.4, "y": 690.5 },
37             { "x": 198.6, "y": 690.5 },
38             { "x": 198.6, "y": 712.9 },
39             { "x": 72.4, "y": 712.9 }
40           ],
41           "referenceText": "John Smith"
42         }
43       ]
44     }
45   }
46 }