For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Book a demoLog in
DocumentationAPI ReferenceModel VersioningChangelog
DocumentationAPI ReferenceModel VersioningChangelog
    • Studio
    • Support
    • Benchmarks
    • Status
  • Getting Started
    • Overview
    • API Quickstart
    • Dashboard Quickstart
    • Agent Quickstart
  • Dev Tools
    • SDKs
    • CLI
  • Capabilities
      • Overview
      • Configuration
      • Response Format
      • Best Practices
      • Error Handling
LogoLogo
Book a demoLog in
On this page
  • Use section-based chunking for RAG
  • Use HTML for complex tables
  • Enable agentic processing only when needed
  • Performance optimization
  • Troubleshooting
  • Recipes
  • Optimized for a RAG pipeline
  • Low-latency, cost-optimized
  • Complex legal documents
  • Handwritten forms
  • Move to production with async processing
CapabilitiesParsing

Best Practices

Was this page helpful?
Previous

Error Handling

Next
Built with

This guide covers practical tips for getting the best results from Parse: tuning for retrieval quality, accuracy, speed, and cost.


Use section-based chunking for RAG

By default, Parse returns one chunk per page. For retrieval, you usually want chunks that are complete semantic units — so each one can be embedded and retrieved on its own without splitting a table or cutting off a paragraph mid-thought.

Setting chunkingStrategy.type to section splits the document at logical boundaries (headings, tables, figures) instead of arbitrary page breaks. Use minCharacters and maxCharacters to keep chunks within your embedding model’s ideal range.

1{
2 "config": {
3 "target": "markdown",
4 "chunkingStrategy": {
5 "type": "section",
6 "options": { "minCharacters": 500, "maxCharacters": 2000 }
7 }
8 }
9}

Section chunking requires target: "markdown". See Chunking strategy for the full options.


Use HTML for complex tables

Markdown tables can’t represent merged cells, nested headers, or multi-row cells — so complex tables often come out misaligned. Set blockOptions.tables.targetFormat to html to preserve the original table structure.

1{
2 "config": {
3 "blockOptions": { "tables": { "targetFormat": "html" } }
4 }
5}

If your tables look broken, switching to html usually fixes it. For tables that span multiple pages, also enable tableHeaderContinuationEnabled so headers repeat on each page. See Tables.


Enable agentic processing only when needed

Agentic processing uses a vision language model to review and correct parsing output. It meaningfully improves accuracy on hard documents, but it adds latency and consumes more credits — so it’s a deliberate accuracy-vs-speed-and-cost tradeoff, not an always-on setting. It’s off by default; enable it only where it helps:

  • text.agentic — corrects low-confidence OCR. Enable for handwriting, faded or skewed scans, unusual fonts, or when you see garbled characters in the output.
  • tables.agentic — reviews and fixes table structure. Enable for tables with misaligned columns, merged cells, or values landing in the wrong column.

Clean, simple PDFs may parse accurately without it — test your document types with it off first, then enable it selectively.

1{
2 "config": {
3 "blockOptions": {
4 "text": { "agentic": { "enabled": true } },
5 "tables": { "agentic": { "enabled": true } }
6 }
7 }
8}

See Text and Tables.


Performance optimization

Parse defaults favor accuracy. Adjust these settings to trade some accuracy for speed and cost, or the reverse.

For fastest, lowest-cost processing:

SettingWhy it helps
blockOptions.text.agentic.enabled: falseSkips VLM-based OCR correction — the single biggest latency saver.
blockOptions.figures.enabled: falseSkips figure classification and summarization.
advancedOptions.pageRotationEnabled: falseSkips rotation detection when pages are already upright.
chunkingStrategy.type: "page" or "document"Avoids the extra work of computing semantic sections.
engine: "parse_light"Faster, cheaper engine for simple, clean documents.

For highest accuracy:

SettingWhy it helps
engine: "parse_performance"Best handling of complex layouts, tables, and scans (default).
target: "markdown" + chunkingStrategy.type: "section"Clean reading order with complete semantic chunks.
blockOptions.text.agentic.enabled: trueCorrects low-confidence OCR and handwriting.
blockOptions.tables.tableHeaderContinuationEnabled: trueRepeats headers across multi-page tables.
blockOptions.text.signatureDetectionEnabled: trueDetects signatures in legal documents.

Troubleshooting

SymptomWhat to try
Poor OCR or garbled textEnable blockOptions.text.agentic.enabled; try target: "spatial" for very messy or skewed scans.
Chunks are too large or too smallTune minCharacters / maxCharacters on section chunking, or switch the chunking type (page / section / document).
Tables look brokenSet blockOptions.tables.targetFormat: "html"; enable tableHeaderContinuationEnabled for multi-page tables.
Processing is too slowSee Performance optimization.
Large document times outMove off the synchronous endpoint — see Move to production with async processing.

Recipes

Ready-to-run config blocks for common scenarios (comments added to explain each setting). For field-level options and defaults, see Configuration.

Optimized for a RAG pipeline

1{
2 "config": {
3 "target": "markdown",
4 "chunkingStrategy": {
5 "type": "section", // complete semantic units to embed and retrieve
6 "options": { "minCharacters": 500, "maxCharacters": 10000 }
7 },
8 "blockOptions": {
9 "figures": { "enabled": true, "figureImageClippingEnabled": false }, // summarize charts/diagrams, skip image exports
10 "tables": { "targetFormat": "html" } // preserve complex table structure
11 }
12 }
13}

Low-latency, cost-optimized

1{
2 "config": {
3 "target": "markdown",
4 "chunkingStrategy": { "type": "page" }, // skip semantic section computation
5 "blockOptions": {
6 "figures": { "enabled": false }, // skip figure analysis (latency + credits)
7 "tables": { "targetFormat": "markdown" }, // lighter than html
8 "text": { "signatureDetectionEnabled": false, "agentic": { "enabled": false } } // skip VLM OCR correction
9 },
10 "advancedOptions": { "pageRotationEnabled": false } // skip rotation detection when pages are upright
11 }
12}

Complex legal documents

1{
2 "config": {
3 "target": "markdown",
4 "chunkingStrategy": { "type": "section" },
5 "blockOptions": {
6 "figures": { "enabled": true, "figureImageClippingEnabled": true },
7 "tables": { "targetFormat": "html", "tableHeaderContinuationEnabled": true }, // keep headers on multi-page tables
8 "text": { "signatureDetectionEnabled": true, "agentic": { "enabled": true } } // catch signatures + fix tricky text
9 }
10 }
11}

Handwritten forms

1{
2 "config": {
3 "target": "spatial", // preserve layout when reading order is unreliable
4 "chunkingStrategy": { "type": "page" },
5 "blockOptions": {
6 "figures": { "enabled": false },
7 "text": { "signatureDetectionEnabled": true, "agentic": { "enabled": true } } // VLM correction for handwriting
8 }
9 }
10}

Move to production with async processing

The quick start and the examples above use the synchronous /parse endpoint — the fastest way to try Parse and iterate on config. When you’re ready to run Parse in production, switch to the asynchronous /parse_runs endpoint.

Why the sync /parse endpoint isn’t built for production:

  • It has a 5-minute timeout — large or complex documents can exceed it and fail.
  • It holds a connection open for the entire parse, which is brittle for big files and bursty traffic.
  • There’s no delivery mechanism — you can’t receive a webhook when the run finishes; you only get the result if the request stays open.
  • It’s intended for onboarding and testing, not sustained workloads.

What async (/parse_runs) gives you:

  • Reliable handling of large documents without timeouts.
  • Results via polling (GET /parse_runs/{id}) or webhooks, so you’re not holding connections open.
  • A better fit for batch and high-volume pipelines.

See Async Processing for the full comparison, polling options, and webhook setup.