Formulas let you write validation rules that check extracted data, perform calculations, and make decisions based on document content. Think of them like Excel formulas, but for validating your workflow data.
Here are some common validation scenarios:
Formulas are made up of three parts:
extraction1.total_amount)=, +, *, >)ISPRESENT(), SUM(), IF())Let’s say you extracted these fields from an invoice:
extraction1.subtotal = 100extraction1.tax = 10extraction1.total = 110You can validate the math with:
This formula returns TRUE if the total equals subtotal plus tax, and FALSE otherwise.
Variables let you access data from different workflow steps. The format depends on the step type:
Format: Just the step name
Example: classification1
This returns the document type as text (e.g., "invoice", "receipt", "contract").
Format: stepName.fieldName
Example: extraction1.invoice_number
This returns the value of the extracted field. The value will be NULL if the field wasn’t found.
For nested fields or arrays:
When referencing a subfield of an array (like line_items.quantity), you get an array of all the quantity values.
Format: stepName.path.to.field
Example: external1.customer.credit_limit
This returns data from your external system based on the JSON path in the response.
Make sure your external API returns data in the right format. For array operations, return an object containing arrays, not an array of objects.
Good: {"prices": [10, 20, 30]}
Bad: [{"price": 10}, {"price": 20}]
Format: metadata.fieldName
Example: metadata.document_id
This returns metadata values attached to your workflow run.
Every value in a formula has a type. This determines what operations you can perform on it.
Currency values behave like numbers in math operations:
This works even if subtotal is currency and tax is a plain number. The result will be currency.
Note: Mixing different currency codes in the same operation will cause an error. For example, $100 USD + $100 AUD is not allowed.
When you reference a field inside an array (like line_items.price), you get an array of all those values:
This adds up all the prices from all line items.
If a field wasn’t extracted, its value is NULL. The behavior depends on how you use it:
<, >, <=, >=): NULL acts like 0=, <>): NULL only equals NULLNULL acts like 0NULL acts like ""NULL acts like FALSECheck if a field has a value:
Provide a default value if missing:
This returns the discount if it exists, otherwise returns 0.
You can use fixed values directly in your formulas:
Numbers: 100, 3.14, -50
Text: Use double quotes "text here"
To include quotes in text, use \":
Booleans: TRUE or FALSE
Dates: Use the DATE() function
Operations let you compare values, do math, and combine text. They follow standard order of operations (like math class): parentheses first, then multiplication/division, then addition/subtraction, then comparisons.
Compare two values and get TRUE or FALSE. All text comparisons are case-insensitive.
Perform calculations on numbers:
Operations are evaluated in this order (like math):
( ... )FUNC( ... )^* /+ - &= <> < > <= >= ~=Example:
This multiplies price by quantity first, then adds tax, then checks if it equals total.
Functions help you handle common validation tasks. Here’s what’s available:
ISPRESENT(field) - Returns TRUE if field has a value (not NULL)
Use this to ensure required fields were extracted.
ISBLANK(field) - Returns TRUE if field is empty (NULL)
Opposite of ISPRESENT.
ISEMPTY(array) - Returns TRUE if array field has no items
Use this to check if an array field is empty.
IF(condition, valueIfTrue, valueIfFalse) - Choose between two values based on a condition
If amount is over 1000, returns “high”, otherwise “normal”.
Real-world example:
This says: “If it’s an invoice, it must have an invoice number. If it’s not an invoice, that’s fine.”
IFNULL(field, defaultValue) - Provide a fallback value if field is empty
Uses the discount if extracted, otherwise uses 0.
AND(condition1, condition2, ...) - All conditions must be TRUE
Amount must be between 0 and 10,000 AND date must be present.
OR(condition1, condition2, ...) - At least one condition must be TRUE
Status must be either “paid” or “completed”.
NOT(condition) - Flip TRUE to FALSE and vice versa
Same as ISPRESENT(extraction1.address).
DATE(text) - Convert text to a date
Check if invoice date is after January 1, 2024.
Dates should be in format: "YYYY-MM-DD"
SUM(values...) - Add up numbers
This validates that the line item costs add up to the total. Perfect for invoice validation! You can also pass multiple values: SUM(a, b, c).
CONTAINS(array, value) - Check if value is in the array
Returns TRUE if “urgent” is one of the categories. Text matching is case-insensitive.
CONTAINSALL(array, [value1, value2]) - Check if all values are in the array
Returns TRUE only if both “approved” AND “verified” are in the tags array.
These functions help you compare values that might not be exactly identical but should still be considered a match. Perfect for handling OCR variations, formatting differences, or semantic equivalence.
FUZZYMATCH(value1, value2, threshold?) - Check if two strings are similar enough
Returns TRUE if the extracted vendor name is similar to “Albertsons” (handles typos, OCR errors, and minor variations).
The optional threshold parameter (0 to 1) controls how strict the matching is. Higher values require closer matches. Defaults to 0.8.
This requires a 90% similarity match—use higher thresholds when you need more precise matching.
How it works:
FALSE if either value is NULL or emptyExamples:
LLMJUDGE(value1, value2, customInstructions?) - AI-powered semantic comparison
Uses AI to determine if two values mean the same thing, even if they look different. Much smarter than exact matching or fuzzy matching.
When to use LLMJUDGE vs FUZZYMATCH:
FUZZYMATCH for simple string similarity (typos, OCR errors)LLMJUDGE when you need semantic understanding (abbreviations, different formats, equivalent meanings)Examples:
This returns TRUE even if the extracted address is “970 Knox Street, Torrance, California 90502, USA” or has other formatting variations.
This matches “International Business Machines Corporation” to “IBM”.
The optional customInstructions parameter lets you give the AI specific guidance on how to evaluate the match:
LLMJUDGE is more powerful but slower than FUZZYMATCH, and not as deterministic. Use FUZZYMATCH for simple string comparisons and reserve LLMJUDGE for cases requiring either semantic understanding/matching or where you need to enable partial matches that fuzzy matching won’t allow.
Here are some real-world examples combining multiple concepts:
Need a function that’s not listed here? Reach out to us on Slack - we’re happy to add more!