When processing high-volume documents or building real-time applications, latency becomes a critical factor. This guide provides the most impactful settings to reduce latency.
Many latency-sensitive settings involve trade-offs with accuracy for complex documents. See Configuration for detailed explanations of each setting.
Use this checklist when optimizing for latency:
Advanced Options
extraction_light for simple document types (verify accuracy with evaluation sets)modelReasoningInsightsEnabled: false) - only needed for debuggingadvancedMultimodalEnabled: false) - unless processing scans/handwritten contentcitationsEnabled: false) - removes spatial location referencesExtraction Chunking Options
pageChunkSize for non-array extraction so the document fits in fewer chunks and skips mergingconfidence or take_first merging instead of intelligentlarge_array_heuristics array strategy if processing large arraysParser Configuration
Workflow
The biggest change you can make to reduce latency is selecting Extraction Light instead of the default Extraction Performance.

Extraction Light is faster and cheaper, but removes support for advanced visual features like figure parsing and signature detection. See the Extraction Light versioning page for details.
Each of these options adds processing overhead. Disable what you don’t need:

For non-array extraction: Increase pageChunkSize so the document fits in a single chunk, which skips intelligent merging entirely—the fastest option.
For large array extraction: Use large_array_heuristics array strategy with smaller chunk sizes.
Merging strategy: Switch from intelligent to confidence, take_first, or take_last to avoid extra processing overhead.


For large extractions or schemas, consider breaking a single extractor into multiple extractors that run in parallel within a Workflow. This is particularly effective when you have both simple top-level fields and complex array extractions.

The workflow above splits a financial document into two parallel extractors — one for high-level fields and one for the line-item details — which run at the same time and are recombined in the workflow output.
Use when: