Confidence Scores | Extend Documentation

A confidence score tells you how confident the model is in an extracted value, as a number between 0 and 1 (closer to 1 is more confident). Use them to trust high-confidence values automatically and route the rest to human review.

The two scores

Every entry in the output.metadata object can carry two independent scores:

Field	Description
`logprobsConfidence`	From the language model, based on token probabilities during generation.
`ocrConfidence`	From the OCR system, about how reliably the underlying text was recognized.

logprobsConfidence is being phased out. The extraction_light processor has never returned it, and extraction_performance version 4.6.0 and later return null. Prefer ocrConfidence and the Review Agent scoring system for new integrations. See Extraction Performance versions.

Accessing confidence scores

Scores live in output.metadata, keyed by path-like notation that mirrors output.value. A root field uses its own name (invoice_number); nested and repeated fields use arrayName[index].propertyName.

1 {
2   "value": {
3     "invoice_number": "INV-123",
4     "line_items": [{ "description": "Item A", "quantity": 2 }]
5   },
6   "metadata": {
7     "invoice_number": { "logprobsConfidence": 1.0, "ocrConfidence": 0.99 },
8     "line_items[0].description": { "logprobsConfidence": 0.98, "ocrConfidence": 0.95 }
9   }
10 }

Working with arrays

Confidence is reported at every level of an array — the array as a whole, each item, and each property within an item:

1 {
2   "metadata": {
3     "line_items": { "logprobsConfidence": 0.98 },
4     "line_items[0]": { "logprobsConfidence": 0.99 },
5     "line_items[0].description": { "logprobsConfidence": 1.0, "ocrConfidence": 0.95 },
6     "line_items[1].description": { "logprobsConfidence": 0.97, "ocrConfidence": 0.92 }
7   }
8 }

Reading scores programmatically

Look up a field’s entry by its path key, then read logprobsConfidence / ocrConfidence.

Python

TypeScript

Java

Go

1 metadata = result.output.metadata
2 
3 # A specific field's scores
4 invoice = metadata["invoice_number"]
5 ocr = invoice.ocr_confidence
6 
7 # Flag low-confidence fields for review
8 for field, meta in metadata.items():
9     if (meta.ocr_confidence or 0) < 0.9:
10         print(f"Low confidence on {field} — route to review")

Routing on confidence in Workflows

In Workflows, conditional steps can branch on an extraction step’s confidence — for example, send low-confidence documents to manual review and auto-process the rest.

Aggregate access:

{{extractionStepName.avgConfidence}} — average across all fields.
{{extractionStepName.minConfidence}} — minimum across all fields.

These aggregates summarize the per-field confidence scores described above. On processors where logprobsConfidence is null (such as extraction_performance 4.6.0+ and all of extraction_light), build routing on ocrConfidence and the Review Agent.

Routing a workflow on average confidence

Specific-field access:

{{extractionStepName.output.metadata.field_name.ocrConfidence}} — a field’s OCR confidence. Prefer this for new integrations.
{{extractionStepName.output.metadata.field_name.logprobsConfidence}} — a field’s model confidence. Legacy: this is null on extraction_performance 4.6.0+ and is never set on extraction_light (see the phase-out note above), so a condition built on it will silently never fire on current processors.

Example Conditional Logic:

Goal	Condition
Route to manual review if any field is low-confidence	`{{extractionStepName.minConfidence}} < 0.9`
Route to manual review if the invoice number has low OCR confidence	`{{extractionStepName.output.metadata.invoice_number.ocrConfidence}} < 0.95`
Auto-process if average confidence is high	`{{extractionStepName.avgConfidence}} >= 0.98`

Array and nested-object confidence access (for example line_items[0].description) is not currently supported in conditional steps.

Limitations and best practices

While confidence scores are a valuable tool for assessing the reliability of extracted data, it’s important to recognize their limitations to use them effectively.

Not a guarantee of accuracy

A high confidence score indicates a high probability of correctness, but it doesn’t guarantee accuracy. Even with a high score, there’s always a chance of errors, so critical data should always be cross-verified.

Context matters

Confidence scores are calculated from the model’s understanding of the data. They may not account for nuances or context that a human reviewer who understands your business would recognize. Because of this, it’s imperative to supply sufficient context on the value you want extracted so the model can determine whether it is confident or not.

For example, if you name a field company_name on an extractor designed for invoices, the model may not be confident which name you’re asking for, since there are often a couple of different company names on an invoice.

Best practices

Provide clear field descriptions. Ambiguous field names lead to lower confidence scores — this is your biggest lever on both accuracy and confidence. See Field Names and Prompt Crafting.
Test and iterate. Monitor confidence patterns in your specific use case and adjust your review thresholds accordingly.