The Review Agent is a specialized system that analyzes extraction results to identify potential issues and produce an intelligent confidence score. It is designed to be a comprehensive replacement for both logprobs and OCR confidence scores, serving as a centralized metric to judge extraction confidence.

The Review Agent is designed to:

Analyze extraction results for potential issues
Produce an intelligent confidence score that is superior to standard logprobs and OCR-based scores
Raise specific issues to help drive review and assist in making corrections

Activating the Review Agent

You can activate the Review Agent in the Advanced settings for your extractor.

Once activated, the Review Agent will run on all subsequent extractions for this processor, providing review agent scores and issue reports in the extraction results.

Review Agent UI

The Review Agent provides a dedicated interface for reviewing extraction confidence and issues.

Extraction Confidence Gauge

The extraction confidence gauge gives you a high-level view of the overall confidence for a field. Hovering over the gauge reveals a detailed breakdown of the score and any specific issues found.

Scalar Fields

For scalar fields (non-arrays), the interface shows a score and a review indicator. The tooltip expands to show specific issues raised for this item, along with a general review overview for that field.

Array Fields

The array interface provides detailed insights into list-based data. It displays common issues that appear across multiple items and includes a heatmap to help visualize where problems were detected in the extraction.

Item-level Scoring: Each item in an array has its own score and issues, and arrays will have a new confidence column generated when running with the review agent enabled.

Hovering over the confidence cell for a specific item displays item-level issues, along with the score for that item. Individual properties within an item also display their own scores when you hover over them. Cells with review scores ≤ 3 display a review indicator bar, helping you quickly identify where problems occur when scrolling through the array results.

Scoring

The Review Agent produces a score that serves as a general indicator of extraction quality.

Score ≤ 3: Usually warrants a review. The agent has likely found something of concern.
Score ≤ 4: Consider reviewing if you are extremely sensitive to accuracy.
Lower Scores: Scores below 3 typically indicate a higher number of major issues or issues of greater severity.

OCR confidence is factored into the Review Agent score. Even if the agent does not identify semantic issues, a low OCR confidence can result in a lower overall score.

Issues

Issues raised by the Review Agent are categorized by severity and relevance.

Severity Colors: Issues are colored based on their severity.
- Red: Severe problems that likely require attention.
- Yellow: Potentially important problems that still warrant review.
- Grey: Less severe or informational issues.

Minor Issues

Sometimes the agent produces issues that are determined to be less relevant to the current extraction. These are classified as “Minor Issues” and are:

Down-weighted in scoring
Hidden by default in the UI

You can view these by activating the minor issues toggle.

Minor issues are usually precautionary in nature and may not reflect actual errors in the extraction.

Issues for Arrays

Arrays have a hierarchical issue structure:

Common Issues: Displayed at the top-level tooltip for the array. These are issues that apply to multiple items in the array.
Item-Specific Issues: Targeted issues specific to a single item in the array. These are listed under the item-level review tooltip.
- Item level issues are considered to be property level if they have specific properties affected

Steering the Review Agent

You can guide the Review Agent’s behavior to better suit your specific needs.

Review Agent Instructions

You can add high level instruction to the Review Agent to teach it what to look for or what to ignore. This is useful for directing overall behavior of the review agent.

Review Agent Rules

The review agent also supports structured rules to help drive the scoring of more specific edge cases you may find in your evaluation set. For issues that always require review or rejection of an extraction, you can set a rule to be “critical”. These rules, if broken, will always set the score of the offending element to 1.

Schema Improvements

The Review Agent picks up on ambiguities and issues that often trace back to the extraction schema. sometimes, clarifying field descriptions or tightening the schema definition is more effective at driving correct Review Agent behavior than adding explicit rules.

Review Agent API

Enabling Review Agent

Review Agent can be enabled for JSON Schema extractors via advancedOptions.reviewAgent.enabled.

1 {
2   "type": "EXTRACT",
3   "schema": { "type": "object", "properties": {} },
4   "advancedOptions": {
5     "reviewAgent": { "enabled": true }
6   }
7 }

Response Structure

The Review Agent adds two main fields to the metadata for each field, array item, and property:

reviewAgentScore: An integer score from 1-5 indicating confidence (or null when Review Agent is enabled but does not return a score for a field).
insights: An array of insights. Review Agent contributes issue and review_summary insights, and these may coexist alongside reasoning insights (if model reasoning insights are enabled).

Metadata Object

1 {
2   "value": {
3     "invoice_number": "INV-123",
4     "line_items": [
5       { "description": "Widget A", "amount": 100 },
6       { "description": "Widget B", "amount": 150 }
7     ]
8   },
9   "metadata": {
10     "invoice_number": {
11       "logprobsConfidence": null,
12       "reviewAgentScore": 5,
13       "insights": [
14         {
15           "type": "review_summary",
16           "content": "No issues detected."
17         }
18       ]
19     },
20     "line_items": {
21       "logprobsConfidence": null,
22       "reviewAgentScore": 4,
23       "insights": [
24         {
25           "type": "review_summary",
26           "content": "Most items look consistent; some fields have potential issues."
27         }
28       ]
29     },
30     "line_items[0]": {
31       "logprobsConfidence": null,
32       "reviewAgentScore": 4
33     },
34     "line_items[0].amount": {
35       "logprobsConfidence": null,
36       "reviewAgentScore": 3,
37       "insights": [
38         {
39           "type": "issue",
40           "content": "Currency symbol is ambiguous."
41         }
42       ]
43     },
44     "line_items[1]": {
45       "logprobsConfidence": null,
46       "reviewAgentScore": 4,
47       "insights": [
48         {
49           "type": "issue",
50           "content": "Potential duplicate items detected in the list."
51         }
52       ]
53     },
54     "line_items[1].amount": {
55       "logprobsConfidence": null,
56       "reviewAgentScore": 4
57     }
58   }
59 }

Field Descriptions

`reviewAgentScore`

A generic confidence score ranging from 1 to 5. When Review Agent is enabled but no score is returned for a field, this value may be null.

5: High confidence. No issues detected.
4: Good confidence. Minor observations or precautionary notes.
3: Moderate confidence. Some uncertainty or potential issues found.
2: Low confidence. Likely contains errors or significant ambiguities.
1: Very low confidence. Critical issues detected.

`insights`

A list of insights. Each insight has a type and content.

type: "issue": A specific problem identified by Review Agent.
type: "review_summary": A general summary of the review findings from Review Agent for that field.
type: "reasoning": Model reasoning about the extraction decision (controlled by model reasoning insights, not Review Agent).

Accessing Review Data

You can access Review Agent data programmatically by traversing the metadata object using standard path notation.

Scalar Fields

For a non-array field like invoice_number, access the metadata directly by key.

1 // reviewAgentScore may be null even when Review Agent is enabled
2 const score = response.metadata.invoice_number.reviewAgentScore ?? 5;
3 const insights = response.metadata.invoice_number.insights ?? [];
4 const issues = insights.filter((i) => i.type === "issue");

Arrays

Arrays have metadata at up to three levels. Note that issues that affect specific items/properties will be distributed to the corresponding item/property keys.

Array Level: response.metadata.line_items
- Contains the overall reviewAgentScore for the array.
- insights may contain the review_summary for the array, and may rarely include issues if an issue applies to the array as a whole.
Item Level: response.metadata['line_items[0]']
- Contains the score for the specific item (or null if no score was returned for that item).
- Includes item-specific issues and any common issues that do not affect a specific property in the item. Issues at this level apply to the entire item
Property Level: response.metadata['line_items[0].amount']
- Contains the score for the specific property (or null if no score was returned for that property).
- Includes item-specific issues and common issues that apply to this specific property

Code Examples

Collecting All Issues for an Array

To gather all issues across an entire array (including item-level and property-level issues), iterate through all metadata keys that match the array pattern.

1 interface Insight {
2   type: "reasoning" | "issue" | "review_summary";
3   content: string;
4 }
5 
6 interface FieldMetadata {
7   reviewAgentScore?: number | null;
8   insights?: Insight[];
9   logprobsConfidence: number | null;
10 }
11 
12 type Metadata = Record<string, FieldMetadata>;
13 
14 function collectAllArrayIssues(
15   metadata: Metadata,
16   arrayName: string
17 ): Insight[] {
18   const issues: Insight[] = [];
19   const pattern = new RegExp(`^${arrayName}\\[\\d+\\]`);
20 
21   for (const [key, fieldMeta] of Object.entries(metadata)) {
22     // Match item-level (line_items[0]) and property-level (line_items[0].amount)
23     if (pattern.test(key) && fieldMeta.insights) {
24       const fieldIssues = fieldMeta.insights.filter((i) => i.type === "issue");
25       issues.push(...fieldIssues);
26     }
27   }
28 
29   return issues;
30 }
31 
32 // Usage
33 const allLineItemIssues = collectAllArrayIssues(response.metadata, "line_items");
34 console.log(`Found ${allLineItemIssues.length} issues in line_items`);

Collecting Issues for a Specific Array Item

To collect all issues for a single array item (including its nested properties), filter metadata keys by the specific item index.

1 function collectItemIssues(
2   metadata: Metadata,
3   arrayName: string,
4   itemIndex: number
5 ): Insight[] {
6   const issues: Insight[] = [];
7   const itemPrefix = `${arrayName}[${itemIndex}]`;
8 
9   for (const [key, fieldMeta] of Object.entries(metadata)) {
10     // Match exact item (line_items[0]) or its properties (line_items[0].amount)
11     if (key === itemPrefix || key.startsWith(`${itemPrefix}.`)) {
12       if (fieldMeta.insights) {
13         const fieldIssues = fieldMeta.insights.filter((i) => i.type === "issue");
14         issues.push(...fieldIssues);
15       }
16     }
17   }
18 
19   return issues;
20 }
21 
22 // Usage: Get issues for the first line item
23 const firstItemIssues = collectItemIssues(response.metadata, "line_items", 0);
24 
25 // Iterate through all items
26 const lineItems = response.value.line_items;
27 for (let i = 0; i < lineItems.length; i++) {
28   const itemIssues = collectItemIssues(response.metadata, "line_items", i);
29   if (itemIssues.length > 0) {
30     console.log(`Item ${i} has ${itemIssues.length} issues:`);
31     itemIssues.forEach((issue) => console.log(`  - ${issue.content}`));
32   }
33 }

Filtering Items by Score Threshold

A common pattern is to identify array items that need human review based on their score.

1 interface ItemWithIssues {
2   index: number;
3   item: Record<string, unknown>;
4   score: number;
5   issues: Insight[];
6 }
7 
8 function getItemsNeedingReview(
9   response: { value: Record<string, unknown[]>; metadata: Metadata },
10   arrayName: string,
11   scoreThreshold: number = 3
12 ): ItemWithIssues[] {
13   const items = response.value[arrayName] as Record<string, unknown>[];
14   const needsReview: ItemWithIssues[] = [];
15 
16   for (let i = 0; i < items.length; i++) {
17     const itemKey = `${arrayName}[${i}]`;
18     const itemMeta = response.metadata[itemKey];
19     const score = itemMeta?.reviewAgentScore ?? 5;
20 
21     if (score <= scoreThreshold) {
22       needsReview.push({
23         index: i,
24         item: items[i],
25         score,
26         issues: collectItemIssues(response.metadata, arrayName, i),
27       });
28     }
29   }
30 
31   return needsReview;
32 }
33 
34 // Usage
35 const itemsToReview = getItemsNeedingReview(response, "line_items", 3);
36 console.log(`${itemsToReview.length} items need human review`);

Using Scores in Workflows

You can use reviewAgentScore in conditional steps to route documents based on quality.

In extend-web, workflow conditions resolve {{ ... }} bindings using a simple dot-path lookup (e.g. {{myStep.output.value.field}}). This works well for scalar fields and array-level metadata keys, but does not support bracket/quoted access for metadata keys like line_items[0] or line_items[0].amount.

If you need coarse routing based on overall extraction quality, the extraction step execution context also exposes summary variables like minReviewAgentScore, avgReviewAgentScore, and numFieldsFlaggedForReview.

1 // Route to human review if the invoice number score is low
2 {{extractionStep.output.metadata.invoice_number.reviewAgentScore}} <= 3
3 
4 // Route based on an array field's overall score (array-level metadata key)
5 {{extractionStep.output.metadata.line_items.reviewAgentScore}} < 4
6 
7 // Route based on the worst (minimum) Review Agent score across scalar fields
8 {{extractionStep.minReviewAgentScore}} <= 3
9 
10 // Route if any fields were flagged for manual review
11 {{extractionStep.numFieldsFlaggedForReview}} > 0