Best Practices

This guide covers best practices for using the Parse API, including performance optimization and accuracy improvements.


Async vs. sync processing

  • Use POST /parse for synchronous parsing when you want results immediately and the file size is small enough for a single request.
  • Use POST /parse/async for asynchronous parsing, which returns a parser run ID you can poll with GET /parser_runs/{id}.

For high-volume production workloads, prefer async parsing.


Performance optimization

For fastest processing:

  • Set blockOptions.text.agentic.enabled to false (most significant speedup—avoids AI-based OCR corrections)
  • Set blockOptions.figures.enabled to false (avoids AI-based figure analysis)
  • Set pageRotationEnabled to false if all pages are correctly oriented
  • Use chunkingStrategy: "document" (fastest) or "page" instead of "section" (section chunking adds CPU overhead during parsing to generate semantic sections)

For highest accuracy:

  • Set target: "markdown" with chunkingStrategy: "section"
  • Set blockOptions.text.agentic.enabled to true for handwritten/degraded documents
  • Set tableHeaderContinuationEnabled to true for multi-page tables
  • Set signatureDetectionEnabled to true for legal documents

Troubleshooting

Poor quality OCR results

  1. Set blockOptions.text.agentic.enabled to true for handwritten/degraded documents
  2. Set pageRotationEnabled to true for rotated pages
  3. Try target: "spatial" for very messy or skewed documents

Chunks are too large or too small

  1. Adjust minCharacters and maxCharacters in chunkingStrategy.options if you are using type: "section" with target: "markdown".
  2. Try different chunking types (page vs section vs document)
  3. Consider using pageRanges to process fewer pages per request

Tables not parsing correctly

  1. Set tableHeaderContinuationEnabled to true for multi-page tables
  2. Set targetFormat: "html" for better table structure
  3. Set blockOptions.tables.agentic.enabled to true