For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Book a demoLog in
DocumentationAPI ReferenceModel VersioningChangelog
DocumentationAPI ReferenceModel VersioningChangelog
    • Studio
    • Support
    • Benchmarks
    • Status
  • Getting Started
    • Overview
    • API Quickstart
    • Dashboard Quickstart
    • Agent Quickstart
  • Dev Tools
    • SDKs
    • CLI
  • Capabilities
      • Overview
      • Configuration
      • Response Format
        • Field Names and Prompt Crafting
        • Latency Optimization
      • Schema
      • Confidence Scores
      • Review Agent
LogoLogo
Book a demoLog in
On this page
  • Schema Fundamentals
  • Field/Property Naming Conventions
  • Field Descriptions
  • Field Types
  • Extractor Architecture Decisions
  • Prompt Crafting
  • Core Principles
  • Common Prompting Mistakes
  • Related Topics
CapabilitiesExtractionBest Practices

Field Names and Prompt Crafting

Was this page helpful?
Previous

Latency Optimization

Next
Built with
Start with Composer for automated optimization!

Before manually tuning your extractor, check out the Composer optimizer which can automatically improve your field descriptions and extraction rules using Extend’s AI agent. After running Composer, refer to these best practices for additional manual refinements and edge cases.

Schema Fundamentals

Field/Property Naming Conventions

Field names are an important part of extend’s extraction model, so keep some of these guidelines in mind:

Tip✓ Good✗ Bad
Use descriptive, consistent namescustomer_email, provider_emailemail
Avoid abbreviationsinvoice_numberinv_num
Standardize property name formattingsnake_case, camelCaseComplex field name, with punctuation
Group related fields with consistent prefixes/titlesselling_agent_email, selling_agent_phone, selling_agent_nameemail, phone, agent_name

Field Descriptions

Write clear, direct, and detailed descriptions

“The unique invoice number printed at the top of the document. For extend invoices, this will often appear with the prefix EX_.”

“Invoice number”

Include extra context:

  • Note any formatting requirements*
  • Mention where the field typically appears, if known

* Formatting can be suggested in field descriptions, but we often recommend implementing custom logic on your own platform.

Field Types

For complete details on all available field types and their configurations, see the Custom Field Types documentation.

Choosing the right field type for your data structure can help improve accuracy.

When to use arrays:

  • Multiple similar items (line items, addresses, phone numbers)
  • Extracting repeated data from tables
  • Lists that can vary in length

Other custom field types:

  • date - For date/time values with automatic ISO format conversion
  • currency - For monetary amounts with amount and currency code
  • signature - For signature detection with printed name, date, and signing status

Extractor Architecture Decisions

Unified vs Multiple Extractors

One of the most common questions is whether to create one unified extractor or multiple specialized extractors for a document type.

Use a unified extractor when:

  • Documents have consistent structure and field requirements
  • All fields are typically present in most documents
  • You want simpler maintenance and deployment

Use multiple extractors when:

  • Processing different document types in the same workflow (e.g., splitting an invoice and associated checks from a single PDF)
  • You need different confidence processing rules for different documents
  • You want to optimize performance for specific document patterns

Prompt Crafting

Effective prompt engineering is crucial for maximizing both extraction coverage and accuracy. Every extractor will have some unique nuance to it, but these tips should provide a solid foundation to build on.

Core Principles

Be Clear, Direct, and Contextual

Write prompts as if explaining to a new employee with no context about the business or document. Include what the data will be used for and provide clear examples.

Example:

1{
2 "description": "Extract the invoice number, which is typically a
3 unique identifier starting with 'INV-' followed by numbers, found
4 in the header section of the document. For Extend invoices, this
5 will often appear with the prefix 'EX_'. Extract exactly as
6 written, including any prefixes or formatting."
7}

For complex extractions, use sequential instructions:

1{
2 "description": "Extract the total amount due from this invoice.
3
4 Steps:
5 1) Locate labels like 'Total:', 'Amount Due:', or 'Balance Due:'
6 2) Extract the complete monetary value including currency symbol
7 3) Include decimal places exactly as shown (e.g., '$1,250.00')"
8}

Common Prompting Mistakes

Vague Instructions:

“Extract important information”

“Extract the invoice number located at one of the top corners of the document.”

Missing Context:

“Extract the date”

“Extract the invoice date (when the invoice was issued), not the due date or any other dates in the document”

Inconsistent Formatting:

“Extract total amount. Use dollar sign.”

“Extract the total amount in the format it appears in the document (e.g., ‘$1,250.00’ or ‘1250.00’). Include currency symbols and decimal places exactly as shown”

Related Topics

  • See Configuration for chunking, merging, and other extraction options
  • For latency optimization strategies, see Latency Optimization