For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Book a demoLog in
DocumentationAPI ReferenceModel VersioningChangelog
DocumentationAPI ReferenceModel VersioningChangelog
    • Studio
    • Support
    • Benchmarks
    • Status
  • Getting Started
    • Overview
    • API Quickstart
    • Dashboard Quickstart
    • Agent Quickstart
  • Dev Tools
    • SDKs
    • CLI
  • Capabilities
      • Overview
      • Configuration
      • Response Format
      • Best Practices
      • Error Handling
LogoLogo
Book a demoLog in
On this page
  • Response structure
  • Top-level fields
  • Inline output vs. URL
  • Understanding chunks
  • content vs. blocks
  • Understanding blocks
  • Block fields
  • Block types
  • Block details
  • Confidence
  • Bounding box coordinates
  • Coordinate system
  • Build highlights and overlays
CapabilitiesParsing

Response Format

Was this page helpful?
Previous

Best Practices

Next
Built with

Parse returns document content in a structure optimized for RAG, LLM context, and citation. Every run gives you both high-level formatted content and detailed block-level data with spatial coordinates. This page explains every field in the response.


Response structure

A completed parse run looks like this (truncated to a single chunk and block for brevity). The parsed content lives in output.chunks. Each chunk has a formatted content string and a blocks array of typed, layout-aware elements.

1{
2 "object": "parse_run",
3 "id": "pr_3f1j6I1gsw5k96xFiCnkM",
4 "file": {
5 "object": "file",
6 "id": "file_GzKUy0VDhHscv7tweODYb",
7 "name": "bank_statement.pdf"
8 },
9 "status": "PROCESSED",
10 "output": {
11 "chunks": [
12 {
13 "object": "chunk",
14 "type": "page",
15 "content": "CHASE JPMorgan Chase Bank, N.A. P O Box 659754...",
16 "metadata": { "pageRange": { "start": 1, "end": 1 } },
17 "blocks": [
18 {
19 "object": "block",
20 "id": "block_WNoJ0WbMj4pRW9MpMpUox",
21 "type": "text",
22 "content": "CHASE JPMorgan Chase Bank, N.A. P O Box 659754 San Antonio, TX 78265 - 9754",
23 "details": {},
24 "metadata": { "page": { "number": 1, "width": 612, "height": 792 } },
25 "polygon": [
26 { "x": 56.873, "y": 35.374 },
27 { "x": 162.173, "y": 35.215 },
28 { "x": 162.245, "y": 81.158 },
29 { "x": 56.938, "y": 81.317 }
30 ],
31 "boundingBox": { "left": 56.873, "top": 35.215, "right": 162.245, "bottom": 81.317 }
32 }
33 ]
34 }
35 ]
36 },
37 "metrics": { "pageCount": 7, "processingTimeMs": 8293 },
38 "usage": { "credits": 14 }
39}

Top-level fields

FieldTypeDescription
objectstringAlways "parse_run".
idstringUnique identifier for the run (e.g. pr_...). Use it to fetch results later.
fileobject | nullThe parsed file (id, name). Reusable as input to other endpoints.
statusstringPENDING, PROCESSING, PROCESSED, or FAILED.
outputobject | nullParsed content. Present when status is PROCESSED and you did not request a URL response. Contains chunks.
outputUrlstring | nullPresigned URL to download the same output shape as JSON. Present only with responseType=url. See Inline output vs. URL.
metricsobjectpageCount and processingTimeMs for the run.
usageobjectCredits consumed (usage.credits).
configobjectThe full configuration used, including defaults that were applied.
failureReason / failureMessagestring | nullMachine-readable code and human-readable message. Present when status is FAILED.

Inline output vs. URL

By default, parsed content is returned inline in output. For large documents, the response can get big, so you can ask for a presigned download URL instead by adding the responseType=url query parameter to your parse request.

Inline (default)
URL (responseType=url)

The content is embedded directly in the response body.

1{
2 "status": "PROCESSED",
3 "output": { "chunks": [ /* ... */ ] },
4 "outputUrl": null
5}

How to use it: read output.chunks directly.

Handle both shapes by checking which field is populated:

Python
TypeScript
Java
Go
1import requests
2
3if response.output_url:
4 output = requests.get(response.output_url).json()
5else:
6 output = response.output
7
8for chunk in output["chunks"]:
9 print(chunk["content"])

Understanding chunks

Chunks are the top-level units in output.chunks. Depending on your chunking strategy, a chunk represents a page (default), a logical section, or the whole document.

1{
2 "object": "chunk",
3 "type": "page",
4 "content": "# Account Summary\n\n| Date | Description | Amount |\n| --- | --- | --- |\n...",
5 "metadata": { "pageRange": { "start": 1, "end": 1 } },
6 "blocks": [ /* ... */ ]
7}
FieldDescription
objectAlways "chunk".
typepage, section, or document, based on your chunking strategy.
contentFully formatted content for the chunk (markdown by default), ready to drop into an LLM prompt.
metadata.pageRangeThe start and end page numbers this chunk covers (equal when the chunk is within a single page).
metadata.minOcrConfidence / avgOcrConfidenceLowest and average per-word OCR confidence for the chunk, or null when word-level confidence is unavailable.
blocksArray of block objects making up the chunk. See Understanding blocks.

content vs. blocks

Each chunk gives you two views of the same content. Pick based on what you’re building:

  • Use chunk.content when you want ready-to-use formatted text: feeding an LLM, building a RAG index, or displaying a page. Concatenate every chunk’s content to reconstruct the whole document.
  • Use chunk.blocks when you need structure or position: pulling out only tables or figures, building citations and highlights, or rendering overlays on the original document.
Python
TypeScript
Java
Go
1def extract_tables(response):
2 tables = []
3 for chunk in response.output.chunks:
4 for block in chunk.blocks:
5 if block.type == "table":
6 tables.append({
7 "content": block.content, # markdown / HTML table
8 "rows": block.details.row_count,
9 "columns": block.details.column_count,
10 "page_number": block.metadata.page.number,
11 "position": block.bounding_box,
12 })
13 return tables

Understanding blocks

Blocks are the atomic elements within a chunk: each paragraph, heading, table, figure, and key-value region is its own block, with type-specific details and spatial coordinates.

1{
2 "object": "block",
3 "id": "block_WNoJ0WbMj4pRW9MpMpUox",
4 "type": "table",
5 "content": "| Date | Description | Amount |\n| --- | --- | --- |\n| 01/05 | Direct Deposit | $2,500.00 |",
6 "details": { "type": "table_details", "rowCount": 2, "columnCount": 3 },
7 "metadata": { "page": { "number": 1, "width": 612, "height": 792 } },
8 "polygon": [
9 { "x": 61.2, "y": 158.4 },
10 { "x": 550.8, "y": 158.4 },
11 { "x": 550.8, "y": 356.4 },
12 { "x": 61.2, "y": 356.4 }
13 ],
14 "boundingBox": { "left": 61.2, "top": 158.4, "right": 550.8, "bottom": 356.4 }
15}

Block fields

FieldTypeDescription
objectstringAlways "block".
idstringStable identifier, derived from the block content.
parentBlockIdstringSet only on child blocks (e.g. a table cell points to its table).
typestringThe kind of element (see Block types).
contentstringThe block’s content in the target format.
detailsobjectType-specific extra data (see Block details).
metadata.pageobjectnumber, plus page width and height.
metadata.textDirectionstringltr or rtl.
metadata.minOcrConfidence / avgOcrConfidencenumber | nullPer-block OCR confidence, or null when unavailable.
polygonarrayPrecise outline as { x, y } points. See Bounding box coordinates.
boundingBoxobjectSimplified rectangle: left, top, right, bottom.
childrenarrayNested blocks (e.g. table cells when cellBlocksEnabled is set).

Block types

TypeDescription
textBody paragraphs and regular text.
headingDocument or section headings.
section_headingSubsection headings.
tableTabular data (markdown or HTML based on config).
table_head / table_cellHeader and body cells, available as child blocks of a table.
figureImages, charts, diagrams, or logos.
key_valueKey-value regions such as form fields.
page_numberPage number indicators.
barcodeBarcodes and QR codes.
formulaMathematical formulas and equations.
header / footerPage headers and footers.

Block details

The details object varies by block type. It is an empty object for plain blocks (like text).

Block typedetails fields
tablerowCount, columnCount.
table_cellrowIndex, columnIndex.
figureimageUrl (clipped figure image), figureType (chart, image, diagram, logo, other) when classification is enabled.
formulalatex representation of the formula.
barcodedecodedValue, format (e.g. QRCode, Code128) when barcode reading is enabled.
key_valueNo extra fields.

Confidence

Parse reports OCR confidence so you can flag regions that may need review. Both chunks and blocks expose two aggregated scores in their metadata:

FieldDescription
minOcrConfidenceThe lowest per-word OCR confidence in the chunk or block (0–1). Use it to catch the weakest text in a region.
avgOcrConfidenceThe average per-word OCR confidence across the chunk or block (0–1). A read on overall quality.

These reflect OCR confidence (how reliably the characters were recognized). Both are null when word-level confidence is unavailable.

1{
2 "object": "chunk",
3 "type": "page",
4 "metadata": {
5 "pageRange": { "start": 1, "end": 1 },
6 "minOcrConfidence": 0.71,
7 "avgOcrConfidence": 0.98
8 }
9}

A low minOcrConfidence could be used as a trigger for routing a page to manual review.


Bounding box coordinates

Every block carries two spatial representations:

  • polygon — the precise outline of the block, as an array of { x, y } points.
  • boundingBox — a simplified, axis-aligned rectangle around the block (left, top, right, bottom).

Coordinate system

Coordinates share the page’s coordinate space, with the origin at the top-left of the page: x increases to the right and y increases downward. Every block reports the page’s own dimensions at metadata.page.width and metadata.page.height in the same units, so you can divide by them to express any position as a fraction of the page when you need normalized, resolution-independent values.

(0, 0) (pageWidth, 0)
┌────────────────────────────────────────────┐
│ (left, top) │
│ ●───────────────────────┐ │
│ │ │ │
│ │ block │ │
│ │ │ │
│ └───────────────────────● │
│ (right, bottom) │
│ │
└────────────────────────────────────────────┘
(0, pageHeight) (pageWidth, pageHeight)
FieldDescription
boundingBox.leftThe block’s left edge, measured from the left of the page.
boundingBox.topThe block’s top edge, measured from the top of the page.
boundingBox.rightThe block’s right edge, measured from the left of the page.
boundingBox.bottomThe block’s bottom edge, measured from the top of the page.
metadata.page.widthPage width, in the same units as the coordinates.
metadata.page.heightPage height, in the same units as the coordinates.

To express a position as a fraction of the page (0–1), divide each coordinate by the page dimensions: left / metadata.page.width and top / metadata.page.height.

Build highlights and overlays

Because coordinates are tied to known page dimensions, you can normalize them to whatever your viewer expects (CSS percentages, image pixels, etc.):

Python
TypeScript
Java
Go
1def find_highlights(response, search_term):
2 highlights = []
3 for chunk in response.output.chunks:
4 for block in chunk.blocks:
5 if search_term not in block.content:
6 continue
7 page = block.metadata.page
8 box = block.bounding_box
9 highlights.append({
10 "page_number": page.number,
11 # Normalized to 0–1 for resolution-independent rendering
12 "left": box.left / page.width,
13 "top": box.top / page.height,
14 "width": (box.right - box.left) / page.width,
15 "height": (box.bottom - box.top) / page.height,
16 })
17 return highlights