Extractor output type
Document processor outputs follow standardized formats based on the processor type. Understanding these formats is essential when working with evaluation sets, webhooks, and API responses.
Extraction output type
The output structure for JSON Schema processors is composed of two properties: value
and metadata
.
The value
object is the actual data extracted from the document which conforms to the JSON Schema defined in the processor config.
The metadata
object holds details like confidence scores and citations for the extracted data. It uses keys that represent the path to the corresponding data within the value object. Crucially, the keys in the metadata
object mirror the structure of the value
object using a path-like notation (e.g., line_items[0].description
), allowing you to precisely pinpoint metadata for any specific field, including those nested within objects or arrays. For instance, if your data has value.line_items[0].name, the metadata specifically for that name field will be found using the key ‘line_items[0].name’ within the metadata object.
Type definition
Accessing Metadata
To access the metadata for a specific field, especially nested ones like items in an array, you use a path-like key string. For example, to get the metadata for the description
of the first item in a line_items
array, the key would be line_items[0].description
.
Here are examples in Python and TypeScript:
Examples
Basic Field Types
Nested Structures
Shared Types
Certain types are shared across different processor outputs. These provide additional context and information about the processor’s decisions.
Type Definition
Example
Insights can appear in both Extraction and Classification outputs to provide transparency into the model’s decision-making process. They are particularly useful when debugging or validating processor results.