For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Book a demoLog in
DocumentationAPI ReferenceModel VersioningChangelog
DocumentationAPI ReferenceModel VersioningChangelog
    • Studio
    • Support
    • Benchmarks
    • Status
  • Getting Started
    • Overview
    • API Quickstart
    • Dashboard Quickstart
    • Agent Quickstart
  • Dev Tools
    • SDKs
    • CLI
  • Capabilities
      • Overview
      • Configuration
      • Response Format
      • Schema
      • Confidence Scores
      • Review Agent
LogoLogo
Book a demoLog in
On this page
  • JSON Schema Structure (schema)
  • Unsupported Features
  • Schema Examples
  • Primitive Schema
  • Object Schema
  • Array Schema
  • Array of Objects
  • Array of Scalars
  • Enum Schema
  • Custom Field Types
  • Date Schema
  • Currency Schema
  • Signature Schema
  • Configuration Examples
  • Basic Example
  • Example with nested fields
  • Example with scalar arrays
  • Example with nested arrays and objects
  • Example with signature, currency, and date fields
CapabilitiesExtraction

Schema

Was this page helpful?
Previous

Confidence Scores

Next
Built with

JSON Schema Structure (schema)

We use JSON Schema to define the structure of the data we extract. Before you get started, we recommend familiarizing yourself with the JSON Schema documentation to understand how to define your schema.

The standard JSON Schema is extremely flexible. We’ve implemented a subset of the standard to support the needs of document extraction. Your schema must follow these rules:

  • The root must be an object type
  • Allowed types are string, number, integer, boolean, object, and array
  • All primitive fields (string, number, boolean, integer) must be nullable (use array type with “null” as an option e.g. "type": ["string", "null"]). Non-nullable primitive types will be rejected with a 400 error.
  • Maximum nesting level is 5 (each non-root object counts as 1 level)
  • Property keys and names can contain letters, numbers, underscores, and hyphens
  • Array items can be objects or primitive types (string, number, integer, boolean)
  • Enums must only contain strings and must contain a null option. Enums without null will be rejected with a 400 error.
  • Custom types are supported by adding a "extend:type": "currency", "extend:type": "signature", or "extend:type": "date" property to the appropriate field type with the required properties. See below for examples.
  • Property names can be added using the "extend:name" property. If supplied, this will override the name of the property as it will appear to the model, but not in the output returned to you. This is useful for providing more descriptive names or instructions to the model without altering the actual keys in your output data structure.
  • You can add descriptions to individual enum values using the "extend:descriptions" property.

Unsupported Features

While we support the JSON Schema structure, we do not support many of the additional features some of which include:

  • Schema composition like anyOf, oneOf, allOf, schema definitions, or recursive schemas
  • Regular expressions and other type-specific validation keywords
  • Conditional schema validation
  • Constant values

Schema Examples

Primitive Schema

All primitive types must be nullable.

1{
2 "field_name": {
3 "type": ["string", "null"],
4 "description": "Field description"
5 },
6 "numeric_field": {
7 "type": ["number", "null"],
8 "description": "A numeric field"
9 },
10 "integer_field": {
11 "type": ["integer", "null"],
12 "description": "An integer field"
13 },
14 "boolean_field": {
15 "type": ["boolean", "null"],
16 "description": "A boolean field"
17 }
18}

Object Schema

Objects must have properties. If you set a required array of the properties, we will respect that order when extracting. If you do not set required array, we will generate it and enforce order.

1{
2 "address": {
3 "type": "object",
4 "properties": {
5 "street": {
6 "type": ["string", "null"],
7 "description": "Street address"
8 },
9 "city": {
10 "type": ["string", "null"],
11 "description": "City name"
12 }
13 },
14 "required": ["street", "city"]
15 }
16}

Array Schema

Arrays can contain either objects or primitive types (string, number, integer, boolean). Primitive array items are not nullable.

Array of Objects
1{
2 "line_items": {
3 "type": "array",
4 "items": {
5 "type": "object",
6 "properties": {
7 "description": {
8 "type": ["string", "null"],
9 "description": "Item description"
10 },
11 "quantity": {
12 "type": ["number", "null"],
13 "description": "Item quantity"
14 },
15 "price": {
16 "type": ["number", "null"],
17 "description": "Item price"
18 }
19 },
20 "required": ["description", "quantity", "price"]
21 },
22 "description": "List of items in the invoice"
23 }
24}
Array of Scalars
1{
2 "product_tags": {
3 "type": "array",
4 "items": {
5 "type": "string"
6 },
7 "description": "List of product tags or categories"
8 }
9}

Enum Schema

Enums must include null as an option. Only strings are supported for enums. The extend:descriptions is an optional array of strings. It is recommended to give more context for each enum option for more accurate extraction.

1{
2 "status": {
3 "enum": ["pending", "approved", "rejected", null],
4 "extend:descriptions": [
5 "Invoice is pending approval",
6 "Invoice has been approved",
7 "Invoice has been rejected",
8 ""
9 ],
10 "description": "Current status of the invoice"
11 }
12}

Custom Field Types

The extend:type keyword enables custom pre-processing and post-processing of fields which bake in best practices and heuristics for the field type.

Date Schema

Date fields must be strings and use the extend:type keyword with the value date. This will guarantee the date format is always an ISO compliant date (yyyy-mm-dd).

1{
2 "invoice_date": {
3 "type": ["string", "null"],
4 "extend:type": "date",
5 "description": "The invoice date"
6 }
7}

Currency Schema

Currency fields must be objects with specific properties.

1{
2 "price": {
3 "type": "object",
4 "extend:type": "currency",
5 "properties": {
6 "amount": {
7 "type": ["number", "null"]
8 },
9 "iso_4217_currency_code": {
10 "type": ["string", "null"]
11 }
12 },
13 "required": ["amount", "iso_4217_currency_code"]
14 }
15}

Signature Schema

Signature fields must be objects with specific properties. This will auto-enable our advanced signature detection in the parsing step prior to extraction, and apply a number of prompt and post-processing heuristics to improve accuracy, particularly on reduction of false positives for signature blocks that are not actually signed.

1{
2 "signature": {
3 "type": "object",
4 "extend:type": "signature",
5 "properties": {
6 "printed_name": {
7 "type": ["string", "null"]
8 },
9 "signature_date": {
10 "type": ["string", "null"],
11 "extend:type": "date"
12 },
13 "is_signed": {
14 "type": ["boolean", "null"]
15 },
16 "title_or_role": {
17 "type": ["string", "null"]
18 }
19 },
20 "required": ["printed_name", "signature_date", "is_signed", "title_or_role"]
21 }
22}

Configuration Examples

Basic Example

1{
2 "schema": {
3 "type": "object",
4 "properties": {
5 "invoice_number": {
6 "type": ["string", "null"],
7 "description": "The unique identifier for this invoice"
8 },
9 "invoice_amount": {
10 "type": "object",
11 "extend:type": "currency",
12 "description": "The total amount of the invoice",
13 "properties": {
14 "amount": {
15 "type": ["number", "null"]
16 },
17 "iso_4217_currency_code": {
18 "type": ["string", "null"]
19 }
20 },
21 "required": ["amount", "iso_4217_currency_code"]
22 }
23 },
24 "required": ["invoice_number", "invoice_amount"]
25 }
26}

Example with nested fields

1{
2 "schema": {
3 "type": "object",
4 "properties": {
5 "line_items": {
6 "type": "array",
7 "description": "Individual items in the invoice",
8 "items": {
9 "type": "object",
10 "properties": {
11 "item_name": {
12 "type": ["string", "null"],
13 "description": "Name of the item"
14 },
15 "quantity": {
16 "type": ["number", "null"],
17 "description": "Number of items"
18 },
19 "unit_price": {
20 "type": "object",
21 "properties": {
22 "amount": {
23 "type": ["number", "null"],
24 "description": "Price per unit"
25 },
26 "iso_4217_currency_code": {
27 "type": ["string", "null"],
28 "description": "Currency code"
29 }
30 },
31 "required": ["amount", "iso_4217_currency_code"]
32 }
33 },
34 "required": ["item_name", "quantity", "unit_price"]
35 }
36 },
37 "payment_status": {
38 "description": "Current payment status",
39 "enum": ["PAID", "PENDING", null],
40 "extend:descriptions": [
41 "Payment has been completed",
42 "Payment is pending",
43 ""
44 ]
45 }
46 },
47 "required": ["line_items", "payment_status"]
48 }
49}

Example with scalar arrays

1{
2 "schema": {
3 "type": "object",
4 "properties": {
5 "product_categories": {
6 "type": "array",
7 "description": "List of product categories",
8 "items": {
9 "type": "string"
10 }
11 }
12 },
13 "required": ["product_categories"]
14 }
15}

Example with nested arrays and objects

1{
2 "schema": {
3 "type": "object",
4 "properties": {
5 "orders": {
6 "type": "array",
7 "description": "List of customer orders",
8 "items": {
9 "type": "object",
10 "properties": {
11 "order_id": {
12 "type": ["string", "null"],
13 "description": "Unique identifier for the order"
14 },
15 "customer_name": {
16 "type": ["string", "null"],
17 "description": "Name of the customer"
18 },
19 "shipments": {
20 "type": "array",
21 "description": "List of shipments for this order",
22 "items": {
23 "type": "object",
24 "properties": {
25 "tracking_number": {
26 "type": ["string", "null"],
27 "description": "Shipping tracking number"
28 },
29 "ship_date": {
30 "type": ["string", "null"],
31 "extend:type": "date",
32 "description": "Date the shipment was sent"
33 },
34 "carrier": {
35 "type": ["string", "null"],
36 "description": "Shipping carrier name"
37 }
38 },
39 "required": ["tracking_number", "ship_date", "carrier"]
40 }
41 }
42 },
43 "required": ["order_id", "customer_name", "shipments"]
44 }
45 }
46 },
47 "required": ["orders"]
48 }
49}

Example with signature, currency, and date fields

1{
2 "schema": {
3 "type": "object",
4 "properties": {
5 "invoice_signature": {
6 "type": "object",
7 "extend:type": "signature",
8 "description": "Details of the invoice signature",
9 "properties": {
10 "printed_name": {
11 "type": ["string", "null"],
12 "description": "The printed name of the signer"
13 },
14 "signature_date": {
15 "type": ["string", "null"],
16 "extend:type": "date",
17 "description": "The date the signature was applied"
18 },
19 "is_signed": {
20 "type": ["boolean", "null"],
21 "description": "Indicates if the document is signed"
22 },
23 "title_or_role": {
24 "type": ["string", "null"],
25 "description": "The title or role of the signer"
26 }
27 },
28 "required": ["printed_name", "signature_date", "is_signed", "title_or_role"]
29 },
30 "invoice_amount": {
31 "type": "object",
32 "extend:type": "currency",
33 "description": "The amount of the invoice",
34 "properties": {
35 "amount": {
36 "type": ["number", "null"],
37 "description": "The numerical value of the amount"
38 },
39 "iso_4217_currency_code": {
40 "type": ["string", "null"],
41 "description": "The ISO 4217 currency code (e.g., USD, EUR)"
42 }
43 },
44 "required": ["amount", "iso_4217_currency_code"]
45 },
46 "invoice_date": {
47 "type": ["string", "null"],
48 "extend:type": "date",
49 "description": "The date of the invoice"
50 }
51 },
52 "required": ["invoice_signature", "invoice_amount", "invoice_date"]
53 }
54}