Response Format

The Parse API returns document content in a structured format that provides both high-level formatted content and detailed block-level information. Start with the response structure, then decide whether you need formatted content or block-level detail.


Response structure

The parse run response contains the parsed content in parseRun.output.chunks. Each chunk contains two key properties:

  1. content: A fully formatted representation of the entire chunk in the target format (e.g., markdown). This is ready to use as-is if you need the complete formatted content of a page.

  2. blocks: An array of individual content blocks that make up the chunk, each with its own formatting, position information, and metadata.


Choose content vs. blocks

  • Use chunk.content when:

    • You need the complete, properly formatted content of a page, already doing the logical placement of blocks (e.g. grouping markdown sections and placing spatially, etc)
    • You want to display or process the document content as a whole (and can just combine all chunk.content values)
    • You’re integrating with systems that expect formatted text (e.g., markdown processors)
  • Use chunk.blocks when:

    • You need to work with specific elements of the document (e.g., only tables or figures)
    • You need spatial information about where content appears on the page, perhaps to build citation systems
    • You’re building a UI that shows or highlights specific document elements

Examples

Extract specific content types

1// Extract all tables from a document
2function extractTables(parseRun) {
3 const tables = [];
4
5 parseRun.output.chunks.forEach(chunk => {
6 chunk.blocks.forEach(block => {
7 if (block.type === 'table') {
8 tables.push({
9 content: block.content,
10 pageNumber: block.metadata.page.number,
11 position: block.boundingBox
12 });
13 }
14 });
15 });
16
17 return tables;
18}
19
20// Extract all figures with their images
21function extractFigures(parseRun) {
22 const figures = [];
23
24 parseRun.output.chunks.forEach(chunk => {
25 chunk.blocks.forEach(block => {
26 if (block.type === 'figure' && block.details.imageUrl) {
27 figures.push({
28 caption: block.content,
29 imageUrl: block.details.imageUrl,
30 figureType: block.details.figureType,
31 pageNumber: block.metadata.page.number
32 });
33 }
34 });
35 });
36
37 return figures;
38}

Reconstruct content with custom formatting

1// Extract headings and their content to create a table of contents
2function createTableOfContents(parseRun) {
3 const toc = [];
4
5 parseRun.output.chunks.forEach(chunk => {
6 chunk.blocks.forEach(block => {
7 if (block.type === 'heading' || block.type === 'section_heading') {
8 toc.push({
9 title: block.content,
10 pageNumber: block.metadata.page.number
11 });
12 }
13 });
14 });
15
16 return toc;
17}

Spatial information

Each block contains spatial information in the form of a polygon (precise outline) and a simplified boundingBox. Use this when you need position-aware output:

  • Highlight specific content in a document viewer
  • Create visual overlays on top of the original document
  • Understand the reading order and layout of the document
1// Create highlight coordinates for a document viewer
2function createHighlights(parseRun, searchTerm) {
3 const highlights = [];
4
5 parseRun.output.chunks.forEach(chunk => {
6 chunk.blocks.forEach(block => {
7 if (block.type === 'text' && block.content.includes(searchTerm)) {
8 highlights.push({
9 pageNumber: block.metadata.page.number,
10 boundingBox: block.boundingBox
11 });
12 }
13 });
14 });
15
16 return highlights;
17}

By leveraging both the formatted content and the structured block information, you can build powerful document processing workflows that combine the convenience of formatted text with the precision of block-level access.