For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Book a demoLog in
DocumentationAPI ReferenceModel VersioningChangelog
DocumentationAPI ReferenceModel VersioningChangelog
    • Studio
    • Support
    • Benchmarks
    • Status
  • Getting Started
    • Overview
    • API Quickstart
    • Dashboard Quickstart
    • Agent Quickstart
  • Dev Tools
    • SDKs
    • CLI
  • Capabilities
      • Overview
      • Configuration
      • Response Format
        • Field Names and Prompt Crafting
        • Latency Optimization
      • Schema
      • Confidence Scores
      • Review Agent
LogoLogo
Book a demoLog in
On this page
  • Quick Reference
  • Light Extraction
  • Disabling Advanced Options
  • Chunking Optimizations
  • Disable Advanced Parsing Options
  • Parallel Extractors
  • Related Topics
CapabilitiesExtractionBest Practices

Latency Optimization

Was this page helpful?
Previous

Schema

Next
Built with

When processing high-volume documents or building real-time applications, latency becomes a critical factor. This guide provides the most impactful settings to reduce latency.

Many latency-sensitive settings involve trade-offs with accuracy for complex documents. See Configuration for detailed explanations of each setting.

Quick Reference

Use this checklist when optimizing for latency:

Advanced Options

  • Use extraction_light for simple document types (verify accuracy with evaluation sets)
  • Turn off model reasoning insights (modelReasoningInsightsEnabled: false) - only needed for debugging
  • Disable advanced multimodal (advancedMultimodalEnabled: false) - unless processing scans/handwritten content
  • Turn off bounding box citations (citationsEnabled: false) - removes spatial location references

Extraction Chunking Options

  • Limit page ranges if data is on specific pages
  • Increase pageChunkSize for non-array extraction so the document fits in fewer chunks and skips merging
  • Use confidence or take_first merging instead of intelligent
  • Use large_array_heuristics array strategy if processing large arrays

Parser Configuration

  • Disable figure parsing - unless documents contain important charts/diagrams
  • Disable formula parsing - unless documents contain mathematical equations
  • Disable agentic OCR - unless processing handwritten/poor quality scans

Workflow

  • Split into parallel extractors if you have both simple fields and complex arrays

Light Extraction

The biggest change you can make to reduce latency is selecting Extraction Light instead of the default Extraction Performance.

Core performance settings in Extend Studio

1{
2 "config": {
3 "baseProcessor": "extraction_light"
4 }
5}

Extraction Light is faster and cheaper, but removes support for advanced visual features like figure parsing and signature detection. See the Extraction Light versioning page for details.


Disabling Advanced Options

Each of these options adds processing overhead. Disable what you don’t need:

OptionWhat disabling doesConfig
Bounding Box CitationsRemoves spatial location references for extracted values. See Citations.citationsEnabled: false
Advanced MultimodalSkips vision-language model processing. Keep enabled for scans/handwriting.advancedMultimodalEnabled: false
Model Reasoning InsightsRemoves decision-making explanations. Only needed for debugging.modelReasoningInsightsEnabled: false

Chunking Optimizations

Chunking options in Extend Studio

For non-array extraction: Increase pageChunkSize so the document fits in a single chunk, which skips intelligent merging entirely—the fastest option.

For large array extraction: Use large_array_heuristics array strategy with smaller chunk sizes.

Merging strategy: Switch from intelligent to confidence, take_first, or take_last to avoid extra processing overhead.

Merging strategy settings in Extend Studio

Merging StrategySpeedUse when
intelligentSlowestAccuracy is critical (default)
confidenceFastGeneral purpose, good default for latency
take_firstFastestAuthoritative values appear at document start
take_lastFastestAuthoritative values appear at document end

Disable Advanced Parsing Options

Parser block options in Extend Studio

  • Figure parsing - Disable unless documents contain important charts/diagrams
  • Formula parsing - Disable unless documents contain mathematical equations
  • Signature detection - Disable unless signature verification is needed
  • Agentic OCR - Disable unless processing handwritten or poor-quality scans

Parallel Extractors

For large extractions or schemas, consider breaking a single extractor into multiple extractors that run in parallel within a Workflow. This is particularly effective when you have both simple top-level fields and complex array extractions.

Workflow showing a financial document split into two parallel extractors

The workflow above splits a financial document into two parallel extractors — one for high-level fields and one for the line-item details — which run at the same time and are recombined in the workflow output.

Use when:

  • Documents have both simple fields and complex arrays
  • Array extraction is significantly slower than the other fields
  • Total latency is critical to your use case

Related Topics

  • See Configuration for detailed explanations of each setting
  • Understand Field Names and Prompt Crafting for schema optimization
  • Explore Evaluation Sets to validate accuracy when optimizing for speed