Parse returns document content in a structure optimized for RAG, LLM context, and citation. Every run gives you both high-level formatted content and detailed block-level data with spatial coordinates. This page explains every field in the response.
A completed parse run looks like this (truncated to a single chunk and block for brevity). The parsed content lives in output.chunks. Each chunk has a formatted content string and a blocks array of typed, layout-aware elements.
By default, parsed content is returned inline in output. For large documents, the response can get big, so you can ask for a presigned download URL instead by adding the responseType=url query parameter to your parse request.
The content is embedded directly in the response body.
How to use it: read output.chunks directly.
Handle both shapes by checking which field is populated:
Chunks are the top-level units in output.chunks. Depending on your chunking strategy, a chunk represents a page (default), a logical section, or the whole document.
Each chunk gives you two views of the same content. Pick based on what you’re building:
chunk.content when you want ready-to-use formatted text: feeding an LLM, building a RAG index, or displaying a page. Concatenate every chunk’s content to reconstruct the whole document.chunk.blocks when you need structure or position: pulling out only tables or figures, building citations and highlights, or rendering overlays on the original document.Blocks are the atomic elements within a chunk: each paragraph, heading, table, figure, and key-value region is its own block, with type-specific details and spatial coordinates.
The details object varies by block type. It is an empty object for plain blocks (like text).
Parse reports OCR confidence so you can flag regions that may need review. Both chunks and blocks expose two aggregated scores in their metadata:
These reflect OCR confidence (how reliably the characters were recognized). Both are null when word-level confidence is unavailable.
A low minOcrConfidence could be used as a trigger for routing a page to manual review.
Every block carries two spatial representations:
polygon — the precise outline of the block, as an array of { x, y } points.boundingBox — a simplified, axis-aligned rectangle around the block (left, top, right, bottom).Coordinates share the page’s coordinate space, with the origin at the top-left of the page: x increases to the right and y increases downward. Every block reports the page’s own dimensions at metadata.page.width and metadata.page.height in the same units, so you can divide by them to express any position as a fraction of the page when you need normalized, resolution-independent values.
To express a position as a fraction of the page (0–1), divide each coordinate by the page dimensions: left / metadata.page.width and top / metadata.page.height.
Because coordinates are tied to known page dimensions, you can normalize them to whatever your viewer expects (CSS percentages, image pixels, etc.):