Extraction Performance

Our highest performance (in terms of accuracy and reliability) base extraction processor. Always stays up to date with the best foundation models available across all of our benchmarks.

Versions

Starting at version 4.6.0, the performance processor does not support logprobsConfidence based confidence scores. Prefer using ocrConfidence and the new Review Agent based scoring system. Reach out to the Extend team if you have any questions.

VersionDateChangelog
4.6.0
2025-12-08
Model upgrade for core reasoning + extraction step in the pipeline. This is a major performance improvement for all extraction tasks based on our internal benchmarks, but especially for the hardest extraction tasks.
4.5.0
2025-11-16
Adds support for two new large array strategies to more declaratively handle extracting very large arrays across very large documents: large_array_max_context and large_array_overlap_context.
4.4.0
2025-11-03
Enables array property level citation support for JSON schema extractors as opposed to previously only supporting (by default) array item level citations. Simply set the arrayCitationStrategy to property in the advancedOptions section of the extraction config.
4.2.1
2025-10-26
Adds support for array strategy controls and updates to default chunking behavior. Disables some advanced parsing options by default to improve latency: figure captioning and signature detection.
4.3.0-beta
2025-09-14
Rolls out a new foundation model in beta that outperforms all our internal benchmarks for extraction. While in beta additional changes might be made to this processor version.
4.2.0
2025-09-06
Add support for arrays of scalars (primitive types like strings, numbers, booleans, integers).
4.1.1
2025-07-10
Small patch to improve handling of some signature and currency fields.
4.1.0
2025-05-15
Add support for array citations and overall improvements to the consistency and reliability of citations.
4.0.0
2025-02-10
- New foundation model in the core extraction pipeline that leads to a big jump in performance, especially on complex array extraction.
3.11.0
2025-04-02
More robust figure parsing with better captions and improved handling of signature fields to reduce false positive rate in most key use cases.
3.10.1
2025-03-21
Small patch to improve handling of some deeply nested extraction fields.
3.10.0
2025-02-09
Significantly improved figure parsing in our pre-processing to better segment out and transform sub-classes of figures (e.g. charts, logos, etc.) in markdown
3.9.0
2024-11-15
Add support for nested arrays and objects. See here for an example schema.
3.8.0
2024-11-04
Add support for nested enums in array and object fields. Make advanced multimodal enabled by default.
3.7.0
2024-10-10
Add support for an new, optimized enum field type. Updates to document pre-processing for handwritten text and dense/large tables.
3.6.0
2024-09-16
Add base support for advanced multimodal features, which can be enabled in the Processor Settings in Studio.
3.5.0
2024-08-27
Several changes to improve extraction performance, including minor model version upgrade, a new bounding box system, and model insights.
3.4.0
2024-08-16
Updates to our document pre-processing to better handle more complex document/table layouts.
3.3.0
2024-07-30
Improvements to signature extraction accuracy.
3.2.0
2024-07-14
Updates to our document pre-processing to better handle checkboxes.
3.1.0
2024-06-01
Updates to our document pre-processing to better handle more complex document/table layouts.
3.0.0
2024-05-14
Promoting a new foundation model to default - it’s faster and more accurate across all of our internal benchmarks for extraction.
2.0.0
2024-04-12
Promoting a new foundation model to default. Very minor increases in accuracy and speed.
1.0.0
2023-08-01
Initial (and now legacy) extraction model.