Best Practices: Field Names and Prompt Crafting

Start with Composer for automated optimization!

Before manually tuning your extractor, check out the Composer optimizer which can automatically improve your field descriptions and extraction rules using Extend’s AI agent. After running Composer, refer to these best practices for additional manual refinements and edge cases.

Schema Fundamentals

Field/Property Naming Conventions

Field names are an important part of extend’s extraction model, so keep some of these guidelines in mind:

Tip✓ Good✗ Bad
Use descriptive, consistent namescustomer_email, provider_emailemail
Avoid abbreviationsinvoice_numberinv_num
Standardize property name formattingsnake_case, camelCaseComplex field name, with punctuation
Group related fields with consistent prefixes/titlesselling_agent_email, selling_agent_phone, selling_agent_nameemail, phone, agent_name

Field Descriptions

Write clear, direct, and detailed descriptions

“The unique invoice number printed at the top of the document. For extend invoices, this will often appear with the prefix EX_.”

“Invoice number”

Include extra context:

  • Note any formatting requirements*
  • Mention where the field typically appears, if known

* Formatting can be suggested in field descriptions, but we often recommend implementing custom logic on your own platform.

Field Types

For complete details on all available field types and their configurations, see the Custom Field Types documentation.

Choosing the right field type for your data structure can help improve accuracy.

When to use arrays:

  • Multiple similar items (line items, addresses, phone numbers)
  • Extracting repeated data from tables
  • Lists that can vary in length

Other custom field types:

  • date - For date/time values with automatic ISO format conversion
  • currency - For monetary amounts with amount and currency code
  • signature - For signature detection with printed name, date, and signing status

Extractor Architecture Decisions

Unified vs Multiple Extractors

One of the most common questions is whether to create one unified extractor or multiple specialized extractors for a document type.

Use a unified extractor when:

  • Documents have consistent structure and field requirements
  • All fields are typically present in most documents
  • You want simpler maintenance and deployment

Use multiple extractors when:

  • Processing different document types in the same workflow (e.g., splitting an invoice and associated checks from a single PDF)
  • You need different confidence processing rules for different documents
  • You want to optimize performance for specific document patterns

Prompt Crafting

Effective prompt engineering is crucial for maximizing both extraction coverage and accuracy. Every extractor will have some unique nuance to it, but these tips should provide a solid foundation to build on.

Core Principles

Be Clear, Direct, and Contextual

Write prompts as if explaining to a new employee with no context about the business or document. Include what the data will be used for and provide clear examples.

Example:

1{
2 "description": "Extract the invoice number, which is typically a
3 unique identifier starting with 'INV-' followed by numbers, found
4 in the header section of the document. For Extend invoices, this
5 will often appear with the prefix 'EX_'. Extract exactly as
6 written, including any prefixes or formatting."
7}

For complex extractions, use sequential instructions:

1{
2 "description": "Extract the total amount due from this invoice.
3
4 Steps:
5 1) Locate labels like 'Total:', 'Amount Due:', or 'Balance Due:'
6 2) Extract the complete monetary value including currency symbol
7 3) Include decimal places exactly as shown (e.g., '$1,250.00')"
8}

Common Prompting Mistakes

Vague Instructions:

“Extract important information”

“Extract the invoice number located at one of the top corners of the document.”

Missing Context:

“Extract the date”

“Extract the invoice date (when the invoice was issued), not the due date or any other dates in the document”

Inconsistent Formatting:

“Extract total amount. Use dollar sign.”

“Extract the total amount in the format it appears in the document (e.g., ‘$1,250.00’ or ‘1250.00’). Include currency symbols and decimal places exactly as shown”