For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
GuidesAPI ReferenceChangelogModel Versioning
GuidesAPI ReferenceChangelogModel Versioning
    • Getting Started
        • Quick Start
        • Configuration Options
        • Recipes
        • Response Format
        • Best Practices
        • Error Codes
      • Batch Processing
LogoLogo
On this page
  • Response structure
  • Choose content vs. blocks
  • Examples
  • Extract specific content types
  • Reconstruct content with custom formatting
  • Spatial information
Core Document ProcessingParsing

Response Format

Was this page helpful?
Previous

Best Practices

Next
Built with

The Parse API returns document content in a structured format that provides both high-level formatted content and detailed block-level information. Start with the response structure, then decide whether you need formatted content or block-level detail.


Response structure

The parse run response contains the parsed content in parseRun.output.chunks. Each chunk contains two key properties:

  1. content: A fully formatted representation of the entire chunk in the target format (e.g., markdown). This is ready to use as-is if you need the complete formatted content of a page.

  2. blocks: An array of individual content blocks that make up the chunk, each with its own formatting, position information, and metadata.


Choose content vs. blocks

  • Use chunk.content when:

    • You need the complete, properly formatted content of a page, already doing the logical placement of blocks (e.g. grouping markdown sections and placing spatially, etc)
    • You want to display or process the document content as a whole (and can just combine all chunk.content values)
    • You’re integrating with systems that expect formatted text (e.g., markdown processors)
  • Use chunk.blocks when:

    • You need to work with specific elements of the document (e.g., only tables or figures)
    • You need spatial information about where content appears on the page, perhaps to build citation systems
    • You’re building a UI that shows or highlights specific document elements

Examples

Extract specific content types

1// Extract all tables from a document
2function extractTables(parseRun) {
3 const tables = [];
4
5 parseRun.output.chunks.forEach(chunk => {
6 chunk.blocks.forEach(block => {
7 if (block.type === 'table') {
8 tables.push({
9 content: block.content,
10 pageNumber: block.metadata.page.number,
11 position: block.boundingBox
12 });
13 }
14 });
15 });
16
17 return tables;
18}
19
20// Extract all formulas with their LaTeX representations
21function extractFormulas(parseRun) {
22 const formulas = [];
23
24 parseRun.output.chunks.forEach(chunk => {
25 chunk.blocks.forEach(block => {
26 if (block.type === 'formula') {
27 formulas.push({
28 content: block.content,
29 latex: block.details.latex,
30 pageNumber: block.metadata.page.number,
31 position: block.boundingBox
32 });
33 }
34 });
35 });
36
37 return formulas;
38}
39
40// Extract all figures with their images
41function extractFigures(parseRun) {
42 const figures = [];
43
44 parseRun.output.chunks.forEach(chunk => {
45 chunk.blocks.forEach(block => {
46 if (block.type === 'figure' && block.details.imageUrl) {
47 figures.push({
48 caption: block.content,
49 imageUrl: block.details.imageUrl,
50 figureType: block.details.figureType,
51 pageNumber: block.metadata.page.number
52 });
53 }
54 });
55 });
56
57 return figures;
58}

Reconstruct content with custom formatting

1// Extract headings and their content to create a table of contents
2function createTableOfContents(parseRun) {
3 const toc = [];
4
5 parseRun.output.chunks.forEach(chunk => {
6 chunk.blocks.forEach(block => {
7 if (block.type === 'heading' || block.type === 'section_heading') {
8 toc.push({
9 title: block.content,
10 pageNumber: block.metadata.page.number
11 });
12 }
13 });
14 });
15
16 return toc;
17}

Spatial information

Each block contains spatial information in the form of a polygon (precise outline) and a simplified boundingBox. Use this when you need position-aware output:

  • Highlight specific content in a document viewer
  • Create visual overlays on top of the original document
  • Understand the reading order and layout of the document
1// Create highlight coordinates for a document viewer
2function createHighlights(parseRun, searchTerm) {
3 const highlights = [];
4
5 parseRun.output.chunks.forEach(chunk => {
6 chunk.blocks.forEach(block => {
7 if (block.type === 'text' && block.content.includes(searchTerm)) {
8 highlights.push({
9 pageNumber: block.metadata.page.number,
10 boundingBox: block.boundingBox
11 });
12 }
13 });
14 });
15
16 return highlights;
17}

By leveraging both the formatted content and the structured block information, you can build powerful document processing workflows that combine the convenience of formatted text with the precision of block-level access.