The BatchProcessorRun object

The BatchProcessorRun object is returned by the Get Batch Processor Run endpoint.

The object represents a run of a processor over a batch of files and contains all the information about the run, including metrics, the processor that was run, and the status of the run.

properties

object

string

The type of response, will always be “batch_processor_run”.

string

The unique identifier for this batch processor run.

processorId

string

The ID of the processor used for this run.

processorVersionId

string

The ID of the specific processor version used.

processorName

string

The name of the processor.

metrics

object

The metrics for the batch processor run.

properties

numFiles

number

The total number of files that were processed.

numPages

number

The total number of pages that were processed.

type

string

The type of batch processor run. Possible values are EXTRACT, CLASSIFY, and SPLITTER.

The sections below show the fields in this object that are present for each type of run.

EXTRACT

fieldMetrics

object

Record mapping field names to their respective metrics.

Field Metrics Structure

meanConfidence

number

The mean confidence score for this field across all documents.

recallPerc

number

The recall percentage for this field, representing how many of the expected values were correctly extracted.

precisionPerc

number

The precision percentage for this field, representing how many of the extracted values were correct.

fieldMetrics

object

For nested object fields, this contains metrics for the child fields. Has the same structure as the parent fieldMetrics.

arrayCardinalityMetrics

object

Maps the root array field name to a number indicating how many times the array field has the correct number of rows extracted.

CLASSIFY

accuracyPerc

number

The overall accuracy percentage.

meanConfidence

number

The mean confidence score.

distribution

object

Record mapping classification values to their counts.

accuracyPercByClassification

object

Mapping from classification to accuracy percentage as calculated from the confusion matrix.

confusionMatrix

object

Mapping from actual class to predicted class to count. Only present when accuracy percentage is present.

SPLITTER

precisionPerc

number

Number of predicted subdocuments that are in the expected set of subdocuments divided by total number of predicted subdocuments.

recallPerc

number

Number of expected subdocuments that are in the predicted set of subdocuments divided by total number of expected subdocuments.

numExpectedDocs

number

The number of expected documents.

numPredictedDocs

number

The number of predicted documents.

numCorrectDocs

number

The number of correctly predicted documents.

meanRunTimeMs

number

The mean runtime in milliseconds per document.

status

string

The current status of the batch processor run. Possible values are PENDING, PROCESSING, PROCESSED, FAILED.

source

string

The source of the batch processor run.

possible values

EVAL_SET

The batch processor run was made from an evaluation set. In this case, the sourceId will be the ID of the evaluation set, such as ev_1234.

PLAYGROUND

The batch processor run was made from the playground. The sourceId will not be set for this value.

STUDIO

The batch processor run was made for a processor in Studio. The sourceId will be the ID of the processor, such as dp_1234.

sourceId

string

The ID of the source of the batch processor run. See the source field for more details.

runCount

number

The number of runs that were made.

options

object

The options for the batch processor run.

properties

fuzzyMatchFields

array

The fields that were fuzzy matched. Optional.

excludeFields

array

The fields that were excluded from the run. Optional.

clearPreProcessingCache

boolean

Whether the pre processing cache was cleared. Optional.

createdAt

string

The date and time the batch processor run was created.

updatedAt

string

The date and time the batch processor run was last updated.

1	{
2	"object": "batch_processor_run",
3	"id": "bpr_1234",
4	"processorId": "dp_5678",
5	"processorVersionId": "dpv_91011",
6	"processorName": "Invoice Extractor",
7	"metrics": {
8	"numFiles": 10,
9	"numPages": 25,
10	"meanRunTimeMs": 1500,
11	"type": "EXTRACT",
12	"fieldMetrics": {
13	"invoice_number": {
14	"meanConfidence": 0.95,
15	"recallPerc": 98.5,
16	"precisionPerc": 99.2
17	},
18	"invoice_date": {
19	"meanConfidence": 0.92,
20	"recallPerc": 95.1,
21	"precisionPerc": 97.3
22	}
23	}
24	},
25	"status": "PROCESSED",
26	"source": "STUDIO",
27	"sourceId": "dp_5678",
28	"runCount": 1,
29	"options": {
30	"fuzzyMatchFields": ["invoice_number"],
31	"excludeFields": ["internal_notes"],
32	"clearPreProcessingCache": false
33	},
34	"createdAt": "2023-05-15T10:30:45Z",
35	"updatedAt": "2023-05-15T10:35:22Z"
36	}