Formulas

Formulas let you write validation rules that check extracted data, perform calculations, and make decisions based on document content. Think of them like Excel formulas, but for validating your workflow data.

Quick Examples

Here are some common validation scenarios:

What you want to checkFormulaDescription
A field was extractedISPRESENT(extraction1.address)Returns TRUE if address was found
Math is correctextraction1.total = extraction1.price * extraction1.quantityValidates multiplication
Value is in rangeextraction1.amount > 0 AND extraction1.amount < 10000Checks amount is between 0 and 10,000
Conditional logicIF(classification1 = "invoice", ISPRESENT(extraction1.invoice_number), TRUE)Invoices must have an invoice number
Array contains valueCONTAINS(extraction1.categories, "urgent")Checks if “urgent” is in the categories list
Fuzzy string matchFUZZYMATCH(extraction1.vendor_name, "Albertsons", 0.8)Matches similar strings (handles typos/OCR errors)
Semantic comparisonLLMJUDGE(extraction1.address, "123 Main St")AI-powered matching for equivalent meanings

Writing Your First Formula

Formulas are made up of three parts:

  1. Variables - Reference data from your workflow (e.g., extraction1.total_amount)
  2. Operations - Compare or calculate values (e.g., =, +, *, >)
  3. Functions - Built-in tools for common tasks (e.g., ISPRESENT(), SUM(), IF())

Example: Validating an invoice total

Let’s say you extracted these fields from an invoice:

  • extraction1.subtotal = 100
  • extraction1.tax = 10
  • extraction1.total = 110

You can validate the math with:

extraction1.total = extraction1.subtotal + extraction1.tax

This formula returns TRUE if the total equals subtotal plus tax, and FALSE otherwise.

How to Reference Variables

Variables let you access data from different workflow steps. The format depends on the step type:

Classification Steps

Format: Just the step name
Example: classification1

This returns the document type as text (e.g., "invoice", "receipt", "contract").

classification1 = "invoice"

Extraction Steps

Format: stepName.fieldName
Example: extraction1.invoice_number

This returns the value of the extracted field. The value will be NULL if the field wasn’t found.

For nested fields or arrays:

extraction1.line_items.quantity
extraction1.address.street

When referencing a subfield of an array (like line_items.quantity), you get an array of all the quantity values.

External Data Validation Steps

Format: stepName.path.to.field
Example: external1.customer.credit_limit

This returns data from your external system based on the JSON path in the response.

Make sure your external API returns data in the right format. For array operations, return an object containing arrays, not an array of objects.

Good: {"prices": [10, 20, 30]}
Bad: [{"price": 10}, {"price": 20}]

Metadata

Format: metadata.fieldName
Example: metadata.document_id

This returns metadata values attached to your workflow run.

Understanding Data Types

Every value in a formula has a type. This determines what operations you can perform on it.

TypeWhat it isExamples
booleanTrue or falseTRUE, FALSE
numberAny number (including decimals)42, 3.14, -100
textLetters, words, sentences"hello", "invoice", ""
dateA calendar date (no time)DATE("2024-01-15")
currencyMoney amount with currency code$100.00 USD
arrayA list of values (all same type)[10, 20, 30], ["a", "b"]

Working with Currency

Currency values behave like numbers in math operations:

extraction1.subtotal + extraction1.tax

This works even if subtotal is currency and tax is a plain number. The result will be currency.

Note: Mixing different currency codes in the same operation will cause an error. For example, $100 USD + $100 AUD is not allowed.

Working with Arrays

When you reference a field inside an array (like line_items.price), you get an array of all those values:

SUM(extraction1.line_items.price)

This adds up all the prices from all line items.

Handling Missing Data (NULL values)

If a field wasn’t extracted, its value is NULL. The behavior depends on how you use it:

  • In comparisons (<, >, <=, >=): NULL acts like 0
  • In equality checks (=, <>): NULL only equals NULL
  • In math operations: NULL acts like 0
  • As text: NULL acts like ""
  • As boolean: NULL acts like FALSE

Check if a field has a value:

ISPRESENT(extraction1.invoice_number)

Provide a default value if missing:

IFNULL(extraction1.discount, 0)

This returns the discount if it exists, otherwise returns 0.

Using Constants

You can use fixed values directly in your formulas:

Numbers: 100, 3.14, -50

extraction1.amount > 1000

Text: Use double quotes "text here"

classification1 = "invoice"

To include quotes in text, use \":

"She said \"Hi!\""

Booleans: TRUE or FALSE

AND(ISPRESENT(extraction1.total), TRUE)

Dates: Use the DATE() function

extraction1.invoice_date > DATE("2024-01-01")

Operations

Operations let you compare values, do math, and combine text. They follow standard order of operations (like math class): parentheses first, then multiplication/division, then addition/subtraction, then comparisons.

Comparison Operations

Compare two values and get TRUE or FALSE. All text comparisons are case-insensitive.

OperationWhat it doesExample
a = bEqual toextraction1.status = "approved"
a ~= bEqual to (ignoring case, alias for =)extraction1.type ~= "INVOICE"
a <> bNot equal toextraction1.amount <> 0
a < bLess thanextraction1.quantity < 100
a <= bLess than or equalextraction1.total <= 5000
a > bGreater thanextraction1.date > DATE("2024-01-01")
a >= bGreater than or equalextraction1.score >= 90

Math Operations

Perform calculations on numbers:

OperationWhat it doesExample
a + bAddextraction1.subtotal + extraction1.tax
a - bSubtractextraction1.paid - extraction1.refund
a * bMultiplyextraction1.price * extraction1.quantity
a / bDivide (returns NULL if dividing by zero)extraction1.total / extraction1.count
a ^ bPower (right-associative: 2^3^2 = 2^(3^2))extraction1.base ^ 2 (squared)
-aNegative-extraction1.discount

Text Operation

OperationWhat it doesExample
s & tJoin text together"Invoice #" & extraction1.invoice_number

Order of Operations

Operations are evaluated in this order (like math):

  1. Parentheses: ( ... )
  2. Functions: FUNC( ... )
  3. Power: ^
  4. Multiply/Divide: * /
  5. Add/Subtract/Concatenate: + - &
  6. Comparisons: = <> < > <= >= ~=

Example:

extraction1.total = extraction1.price * extraction1.quantity + extraction1.tax

This multiplies price by quantity first, then adds tax, then checks if it equals total.

Built-in Functions

Functions help you handle common validation tasks. Here’s what’s available:

Checking for Values

ISPRESENT(field) - Returns TRUE if field has a value (not NULL)

ISPRESENT(extraction1.invoice_number)

Use this to ensure required fields were extracted.

ISBLANK(field) - Returns TRUE if field is empty (NULL)

ISBLANK(extraction1.optional_notes)

Opposite of ISPRESENT.

ISEMPTY(array) - Returns TRUE if array field has no items

ISEMPTY(extraction1.line_items)

Use this to check if an array field is empty.

Conditional Logic

IF(condition, valueIfTrue, valueIfFalse) - Choose between two values based on a condition

IF(extraction1.amount > 1000, "high", "normal")

If amount is over 1000, returns “high”, otherwise “normal”.

Real-world example:

IF(classification1 = "invoice",
ISPRESENT(extraction1.invoice_number),
TRUE)

This says: “If it’s an invoice, it must have an invoice number. If it’s not an invoice, that’s fine.”

IFNULL(field, defaultValue) - Provide a fallback value if field is empty

IFNULL(extraction1.discount, 0)

Uses the discount if extracted, otherwise uses 0.

Logic Operations

AND(condition1, condition2, ...) - All conditions must be TRUE

AND(
extraction1.amount > 0,
extraction1.amount < 10000,
ISPRESENT(extraction1.invoice_date)
)

Amount must be between 0 and 10,000 AND date must be present.

OR(condition1, condition2, ...) - At least one condition must be TRUE

OR(
extraction1.status = "paid",
extraction1.status = "completed"
)

Status must be either “paid” or “completed”.

NOT(condition) - Flip TRUE to FALSE and vice versa

NOT(ISBLANK(extraction1.address))

Same as ISPRESENT(extraction1.address).

Working with Dates

DATE(text) - Convert text to a date

extraction1.invoice_date > DATE("2024-01-01")

Check if invoice date is after January 1, 2024.

Dates should be in format: "YYYY-MM-DD"

Array Functions

SUM(values...) - Add up numbers

SUM(extraction1.line_items.cost) = extraction1.total_amount_due

This validates that the line item costs add up to the total. Perfect for invoice validation! You can also pass multiple values: SUM(a, b, c).

CONTAINS(array, value) - Check if value is in the array

CONTAINS(extraction1.categories, "urgent")

Returns TRUE if “urgent” is one of the categories. Text matching is case-insensitive.

CONTAINSALL(array, [value1, value2]) - Check if all values are in the array

CONTAINSALL(extraction1.tags, ["approved", "verified"])

Returns TRUE only if both “approved” AND “verified” are in the tags array.

Smart Matching

These functions help you compare values that might not be exactly identical but should still be considered a match. Perfect for handling OCR variations, formatting differences, or semantic equivalence.

FUZZYMATCH(value1, value2, threshold?) - Check if two strings are similar enough

FUZZYMATCH(extraction1.vendor_name, "Albertsons")

Returns TRUE if the extracted vendor name is similar to “Albertsons” (handles typos, OCR errors, and minor variations).

The optional threshold parameter (0 to 1) controls how strict the matching is. Higher values require closer matches. Defaults to 0.8.

FUZZYMATCH(extraction1.merchant, "ACME Corporation", 0.9)

This requires a 90% similarity match—use higher thresholds when you need more precise matching.

How it works:

  • Case-insensitive: “ALBERTSONS” matches “albertsons”
  • Ignores symbols and normalizes whitespace
  • Great for: vendor names, company names, product descriptions
  • Returns FALSE if either value is NULL or empty

Examples:

// Basic vendor name matching
FUZZYMATCH(extraction1.vendor_name, "Walmart")
// Stricter matching for important fields
FUZZYMATCH(extraction1.company_name, external1.registered_name, 0.95)
// Looser matching for descriptions with potential OCR issues
FUZZYMATCH(extraction1.item_description, "Office Supplies", 0.7)

LLMJUDGE(value1, value2, customInstructions?) - AI-powered semantic comparison

LLMJUDGE(extraction1.merchant, "albertsons")

Uses AI to determine if two values mean the same thing, even if they look different. Much smarter than exact matching or fuzzy matching.

When to use LLMJUDGE vs FUZZYMATCH:

  • Use FUZZYMATCH for simple string similarity (typos, OCR errors)
  • Use LLMJUDGE when you need semantic understanding (abbreviations, different formats, equivalent meanings)

Examples:

// Address matching - handles format differences
LLMJUDGE(extraction1.address, "970 Knox St, Torrance, CA 90502")

This returns TRUE even if the extracted address is “970 Knox Street, Torrance, California 90502, USA” or has other formatting variations.

// Company name matching with context
LLMJUDGE(extraction1.vendor, "IBM", "Company names should match even with legal suffixes like Inc, LLC, Corp")

This matches “International Business Machines Corporation” to “IBM”.

// Matching descriptions with semantic understanding
LLMJUDGE(extraction1.item_name, "laptop computer", "Product names should match if they refer to the same type of item")

The optional customInstructions parameter lets you give the AI specific guidance on how to evaluate the match:

// Be strict about matching specific product SKUs
LLMJUDGE(extraction1.product_code, external1.sku, "Only match if these refer to exactly the same product SKU, not similar products")
// Be flexible about date formatting
LLMJUDGE(extraction1.date_text, "January 15, 2024", "Dates should match regardless of format - Jan 15 2024, 1/15/24, 2024-01-15 are all equivalent")

LLMJUDGE is more powerful but slower than FUZZYMATCH, and not as deterministic. Use FUZZYMATCH for simple string comparisons and reserve LLMJUDGE for cases requiring either semantic understanding/matching or where you need to enable partial matches that fuzzy matching won’t allow.

Common Validation Patterns

Here are some real-world examples combining multiple concepts:

Invoice Total Validation

SUM(extraction1.line_items.price * extraction1.line_items.quantity) + extraction1.tax = extraction1.total

Required Field for Document Type

IF(classification1 = "invoice",
AND(
ISPRESENT(extraction1.invoice_number),
ISPRESENT(extraction1.invoice_date),
ISPRESENT(extraction1.total)
),
TRUE
)

Date Range Validation

AND(
extraction1.start_date > DATE("2024-01-01"),
extraction1.end_date < DATE("2024-12-31"),
extraction1.end_date >= extraction1.start_date
)

Amount Within Credit Limit

extraction1.total_amount <= external1.customer.credit_limit

Status Must Be Valid

OR(
extraction1.status = "pending",
extraction1.status = "approved",
extraction1.status = "rejected"
)

Vendor Name Verification with Fuzzy Matching

FUZZYMATCH(extraction1.vendor_name, external1.approved_vendor, 0.85)

Smart Address Matching

LLMJUDGE(
extraction1.billing_address,
external1.customer_address,
"Addresses should match even with formatting differences like abbreviations (St vs Street) or missing zip codes"
)

Combined Validation with Smart Matching

AND(
LLMJUDGE(extraction1.merchant, metadata.expected_merchant),
extraction1.total <= metadata.expected_amount * 1.1,
ISPRESENT(extraction1.date)
)

Need a function that’s not listed here? Reach out to us on Slack - we’re happy to add more!