Latency Optimization | Extend Documentation

When processing high-volume documents or building real-time applications, latency becomes a critical factor. This guide provides the most impactful settings to reduce latency.

Many latency-sensitive settings involve trade-offs with accuracy for complex documents. See Configuration for detailed explanations of each setting.

Quick Reference

Use this checklist when optimizing for latency:

Advanced Options

Use extraction_light for simple document types (verify accuracy with evaluation sets)
Turn off model reasoning insights (modelReasoningInsightsEnabled: false) - only needed for debugging
Disable advanced multimodal (advancedMultimodalEnabled: false) - unless processing scans/handwritten content
Turn off bounding box citations (citationsEnabled: false) - removes spatial location references

Extraction Chunking Options

Limit page ranges if data is on specific pages
Increase pageChunkSize for non-array extraction so the document fits in fewer chunks and skips merging
Use confidence or take_first merging instead of intelligent
Use large_array_heuristics array strategy if processing large arrays

Parser Configuration

Disable figure parsing - unless documents contain important charts/diagrams
Disable formula parsing - unless documents contain mathematical equations
Disable agentic OCR - unless processing handwritten/poor quality scans

Workflow

Split into parallel extractors if you have both simple fields and complex arrays

Light Extraction

The biggest change you can make to reduce latency is selecting Extraction Light instead of the default Extraction Performance.

Core performance settings in Extend Studio

1 {
2   "config": {
3     "baseProcessor": "extraction_light"
4   }
5 }

Extraction Light is faster and cheaper, but removes support for advanced visual features like figure parsing and signature detection. See the Extraction Light versioning page for details.

Disabling Advanced Options

Each of these options adds processing overhead. Disable what you don’t need:

Option	What disabling does	Config
Bounding Box Citations	Removes spatial location references for extracted values. See Citations.	`citationsEnabled: false`
Advanced Multimodal	Skips vision-language model processing. Keep enabled for scans/handwriting.	`advancedMultimodalEnabled: false`
Model Reasoning Insights	Removes decision-making explanations. Only needed for debugging.	`modelReasoningInsightsEnabled: false`

Chunking Optimizations

Chunking options in Extend Studio

For non-array extraction: Increase pageChunkSize so the document fits in a single chunk, which skips intelligent merging entirely—the fastest option.

For large array extraction: Use large_array_heuristics array strategy with smaller chunk sizes.

Merging strategy: Switch from intelligent to confidence, take_first, or take_last to avoid extra processing overhead.

Merging strategy settings in Extend Studio

Merging Strategy	Speed	Use when
`intelligent`	Slowest	Accuracy is critical (default)
`confidence`	Fast	General purpose, good default for latency
`take_first`	Fastest	Authoritative values appear at document start
`take_last`	Fastest	Authoritative values appear at document end

Disable Advanced Parsing Options

Parser block options in Extend Studio

Figure parsing - Disable unless documents contain important charts/diagrams
Formula parsing - Disable unless documents contain mathematical equations
Signature detection - Disable unless signature verification is needed
Agentic OCR - Disable unless processing handwritten or poor-quality scans

Parallel Extractors

For large extractions or schemas, consider breaking a single extractor into multiple extractors that run in parallel within a Workflow. This is particularly effective when you have both simple top-level fields and complex array extractions.

Workflow showing a financial document split into two parallel extractors

The workflow above splits a financial document into two parallel extractors — one for high-level fields and one for the line-item details — which run at the same time and are recombined in the workflow output.

Use when:

Documents have both simple fields and complex arrays
Array extraction is significantly slower than the other fields
Total latency is critical to your use case

See Configuration for detailed explanations of each setting
Understand Field Names and Prompt Crafting for schema optimization
Explore Evaluation Sets to validate accuracy when optimizing for speed