For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
GuidesAPI ReferenceChangelogModel Versioning
GuidesAPI ReferenceChangelogModel Versioning
    • Getting Started
        • Quick Start
        • Configuring an Extractor
          • Overview
          • Field Names and Prompt Crafting
          • Advanced Options
          • Latency Optimization
        • Schema
        • Output Types
        • Citations
        • Confidence Scores
        • Review Agent
      • Batch Processing
LogoLogo
On this page
  • Schema Fundamentals
  • Field/Property Naming Conventions
  • Field Descriptions
  • Field Types
  • Extractor Architecture Decisions
  • Prompt Crafting
  • Core Principles
  • Common Prompting Mistakes
  • Related Topics
Core Document ProcessingExtractionBest Practices

Best Practices: Field Names and Prompt Crafting

Was this page helpful?
Previous

Best Practices: Advanced Options

Next
Built with
Start with Composer for automated optimization!

Before manually tuning your extractor, check out the Composer optimizer which can automatically improve your field descriptions and extraction rules using Extend’s AI agent. After running Composer, refer to these best practices for additional manual refinements and edge cases.

Schema Fundamentals

Field/Property Naming Conventions

Field names are an important part of extend’s extraction model, so keep some of these guidelines in mind:

Tip✓ Good✗ Bad
Use descriptive, consistent namescustomer_email, provider_emailemail
Avoid abbreviationsinvoice_numberinv_num
Standardize property name formattingsnake_case, camelCaseComplex field name, with punctuation
Group related fields with consistent prefixes/titlesselling_agent_email, selling_agent_phone, selling_agent_nameemail, phone, agent_name

Field Descriptions

Write clear, direct, and detailed descriptions

“The unique invoice number printed at the top of the document. For extend invoices, this will often appear with the prefix EX_.”

“Invoice number”

Include extra context:

  • Note any formatting requirements*
  • Mention where the field typically appears, if known

* Formatting can be suggested in field descriptions, but we often recommend implementing custom logic on your own platform.

Field Types

For complete details on all available field types and their configurations, see the Custom Field Types documentation.

Choosing the right field type for your data structure can help improve accuracy.

When to use arrays:

  • Multiple similar items (line items, addresses, phone numbers)
  • Extracting repeated data from tables
  • Lists that can vary in length

Other custom field types:

  • date - For date/time values with automatic ISO format conversion
  • currency - For monetary amounts with amount and currency code
  • signature - For signature detection with printed name, date, and signing status

Extractor Architecture Decisions

Unified vs Multiple Extractors

One of the most common questions is whether to create one unified extractor or multiple specialized extractors for a document type.

Use a unified extractor when:

  • Documents have consistent structure and field requirements
  • All fields are typically present in most documents
  • You want simpler maintenance and deployment

Use multiple extractors when:

  • Processing different document types in the same workflow (e.g., splitting an invoice and associated checks from a single PDF)
  • You need different confidence processing rules for different documents
  • You want to optimize performance for specific document patterns

Prompt Crafting

Effective prompt engineering is crucial for maximizing both extraction coverage and accuracy. Every extractor will have some unique nuance to it, but these tips should provide a solid foundation to build on.

Core Principles

Be Clear, Direct, and Contextual

Write prompts as if explaining to a new employee with no context about the business or document. Include what the data will be used for and provide clear examples.

Example:

1{
2 "description": "Extract the invoice number, which is typically a
3 unique identifier starting with 'INV-' followed by numbers, found
4 in the header section of the document. For Extend invoices, this
5 will often appear with the prefix 'EX_'. Extract exactly as
6 written, including any prefixes or formatting."
7}

For complex extractions, use sequential instructions:

1{
2 "description": "Extract the total amount due from this invoice.
3
4 Steps:
5 1) Locate labels like 'Total:', 'Amount Due:', or 'Balance Due:'
6 2) Extract the complete monetary value including currency symbol
7 3) Include decimal places exactly as shown (e.g., '$1,250.00')"
8}

Common Prompting Mistakes

Vague Instructions:

“Extract important information”

“Extract the invoice number located at one of the top corners of the document.”

Missing Context:

“Extract the date”

“Extract the invoice date (when the invoice was issued), not the due date or any other dates in the document”

Inconsistent Formatting:

“Extract total amount. Use dollar sign.”

“Extract the total amount in the format it appears in the document (e.g., ‘$1,250.00’ or ‘1250.00’). Include currency symbols and decimal places exactly as shown”

Related Topics

  • Learn about Advanced Options for chunking & merging strategies
  • For latency optimization strategies, see Latency Optimization