For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Book a demoLog in
DocumentationAPI ReferenceModel VersioningChangelog
DocumentationAPI ReferenceModel VersioningChangelog
LogoLogo
Book a demoLog in

Changelog


Stay up to date on what’s shipping in the Extend platform.

May 26, 2026
May 26, 2026

May 20, 2026
May 20, 2026

May 14, 2026
May 14, 2026

Extraction Performance 4.8.0

Extraction Performance 4.8.0 is now the latest base processor version. It upgrades the base models used for core extraction and large array strategies


May 14, 2026
May 14, 2026

May 14, 2026
May 14, 2026

May 8, 2026
May 8, 2026

April 9, 2026
April 9, 2026

April 6, 2026
April 6, 2026

April 3, 2026
April 3, 2026

Formula handling in Parse

The new parse engine can now parse formula blocks as LaTeX. Enable it as an advanced option to extract math content directly from documents.


April 3, 2026
April 3, 2026

Strikethrough detection in Parse

The new parse engine now detects and annotates strikethrough text via a specialized model. Enable it as an advanced option in the parser config.

Older posts

Next
Built with

OCR word confidence on parse chunk and block metadata

Parse runs now include minOcrConfidence and avgOcrConfidence on chunk and block metadata — the minimum and average per-word OCR confidence across the words in that region.

Both fields are returned on every chunk and block, and are null when word-level confidence isn’t produced for that region. Values are in the range 0–1.

For word-level confidence scores, set returnOcr.words to true.

Generate extractor JSON schemas via the REST API when you create an extractor

For API version 2026-02-09, you can optionally pass generate on POST /extractors instead of supplying config. Provide one to five sample inputs as Extend file IDs or file URLs and Extend will generate a JSON extraction schema from those examples and return the extractor with the schema applied. Optionally add generate.instructions (up to 2,500 characters) to provide additional context about the document type or requirements for how values should be extracted. You cannot combine generate with cloneExtractorId or config.

  • generate.files: 1–5 entries, each a file { id } or { url }.
  • generate.instructions: optional free-text guidance to steer schema generation.

Include a reference date in extraction prompts

You can opt in to an advanced extraction setting that adds a fixed “current date” line to the system prompt. The model can use it when a field depends on today, relative phrases like “30 days from now”, or ambiguous short dates on the page (for example interpreting 02/03/26). The value is taken from when the run was created in UTC.

The option is off by default. For API version 2026-02-09, set advancedOptions.currentDateEnabled to true. See the JSON Schema extraction guide for how advanced options fit into your setup.

Detailed credit breakdown on run responses

Run objects now include richer usage metadata so you can see how credits relate to underlying work alongside the billed amount (credits). The existing usage.credits value is unchanged. For full run payloads and webhook events, responses can also include totalCredits, representing all contributing charges for that logical run (for example extraction plus parsing when parsing was billed for that run), and a breakdown array listing each contributing resource type, id, and credit amount.

On list endpoints, summaries include credits and totalCredits but omit breakdown to keep payloads small. Runs written before totalCredits and breakdown were stored may expose only credits; treat totalCredits and breakdown as optional. For background on billing units, see How credits work. For the full object shape, see usage on the extract run response.

Extraction citation mode control (line, word, block)

When bounding box citations are enabled on an extractor, you can set citationMode in advancedOptions to line, word, or block so citation polygons match the granularity you want. If you leave it unset, behavior matches what you have today (line-based citation processing plus block overlap handling across supported parse engines).

Extraction pipelines that use the parse 2.0.0-beta engine can now return bounding box citations for extracted values, so you are not limited to older parse versions when you need spatial references.

See Citations for how citations appear on extracted fields.

  • citationMode — optional; configure in extractor advanced options in Extend Studio or on extractors (line, word, or block)

Batch Extract and Batch Parse APIs

Two new endpoints make bulk background processing easier:

  • /extract/batch — queue thousands of files for extraction in the background without running into rate limits.
  • /parse/batch — bulk background parse operations.

Splitter Composer

Composer is now available for splitters, so you can optimize them automatically from eval sets. Previously, Composer was only available for extractors and classifiers.

If you’re working on splitting, check out our Splitter Benchmark to see how we evaluate models.