Composer

Composer is an AI assistant that helps you improve your configurations in Extend: extraction schemas, splitters, etc.

The Composer agent is currently in beta and is not available to all users. If you would like to be added to the beta, please contact us at support@extend.ai or via Slack if you have a shared channel with us.

Overview

Right now, Composer is only available for extraction schemas and can be run as a background agent to optimize your schema based on what it learns from your verified evaluation sets.

Key benefits:

  • Automated improvement: Let AI analyze and optimize your extraction rules
  • Data-driven optimization: Uses real evaluation data to guide improvements
  • Measurable results: See exact accuracy improvements for each field and what changes were made before applying updates
  • Time savings: Reduce hours of manual configuration tuning to minutes

How It Works

The Composer agent has access to the following tools to optimize your schema:

  1. Analyze: Examines your current extraction configuration and evaluates it against your chosen evaluation set(s)
  2. Read: Can view individual files and outputs in the evaluation set to understand what’s going on
  3. Generate: Creates multiple candidate improvements for field descriptions and extraction rules
  4. Evaluate: Tests each candidate configuration against the evaluation set

The background agent runs these processes in a loop (limited via “Max generation runs”) to find the best possible improvements.

Prerequisites

Before running the Composer, ensure you have:

The quality of optimization results directly depends on the quality of your evaluation sets. Poor or unrepresentative ground truth data will lead to suboptimal or incorrect optimizations.

1. High-Quality Evaluation Sets

Your evaluation sets must have:

  • Representative samples: Include diverse examples that reflect real-world documents
  • Accurate ground truth: Manually verified correct values for all fields
  • Sufficient coverage: At least 20-30+ documents for reliable optimization
  • Recent data: Evaluation sets should reflect current document formats and content

2. Stable Schema Structure

Before optimization:

  • Finalize your schema structure (field names, types, nesting)
  • Use clear, unambiguous field names that accurately describe the task
  • Avoid vague or generic field names like “value1” or “data”

3. Clear Extraction Rules

Ensure your initial extraction rules:

  • Clearly define what should be extracted for each field
  • Include any specific formatting or validation requirements
  • Specify how to handle edge cases or missing data

Configuring Composer

Before using Composer, Publish your processor to create a new version. This will allow you to revert changes if apply the updates, but the optimizations don’t perform as expected in production.

To run the background agent:

  1. Navigate to your processor’s Performance tab
  2. Click the Optimize button
  3. Configure the optimization parameters

Evaluation Set

Select the evaluation set to use for optimization. The agent will test improvements against this set to measure accuracy gains.

Choose an evaluation set that best represents your typical documents. Using a non-representative set may lead to optimizations that work well for the test data but poorly in production.

Processor Version

Select which version of your processor to optimize. Typically, you’ll optimize your draft version before publishing.

Max Generation Runs

Controls how many generation runs the agent will perform (default: 3). The higher the number, the longer it will take to complete, and the more credits it will cost, however allowing for more generations will almost always lead to better results.

Improvement Threshold (%)

The minimum accuracy improvement required to consider a change worthwhile (default: 5%).

  • Changes below this threshold won’t be suggested
  • Set higher to reduce noise and only show changes that are guaranteed to be impactful
  • Set lower to see all potential improvements

Email Notification

Enable to receive an email when the optimization run completes. Recommended for longer runs.

Running the Optimization

  1. After configuring parameters, click Optimize
  2. The optimization will run in the background
  3. Monitor progress in the optimization runs table
  4. Once complete, review the proposed changes

Reviewing Results

When optimization completes, you’ll see a detailed view of proposed changes:

Understanding the Results

The results view shows:

  1. Field-by-field improvements: Each optimized field with its accuracy gain
  2. Original vs. Updated descriptions: Side-by-side comparison of changes
  3. Impact metrics: Percentage accuracy improvement for each field
  4. JSON diff view: Technical view of exact configuration changes

Interpreting Accuracy Improvements

  • Green percentages (e.g., +17.5%): Significant improvements worth applying
  • Higher improvements (>20%): Major gains, often from clarifying ambiguous descriptions
  • Modest improvements (5-10%): Incremental gains that add up across many fields

Applying Changes

After reviewing the proposed optimizations:

  1. Review each change carefully to ensure it aligns with your extraction requirements
  2. Check the updated descriptions for accuracy and clarity
  3. Click Apply Updates to Draft to update your configuration
  4. Test the updated configuration with additional documents
  5. Monitor performance after deploying to production

You can always revert changes by using the version history if the optimizations don’t perform as expected in production.

Limitations

Current limitations of the Composer agent:

Technical Limitations

  • Schema structure: Cannot modify field types or schema structure. Does not currently handle deeply nested (> 2 levels of nesting) fields well.
  • Advanced options: Cannot yet modify advanced options in your processor (e.g. chunking options, etc)

Performance Considerations

  • Large evaluation sets: Optimization time increases with evaluation set size
  • Complex schemas: Deeply nested schemas may take longer to optimize
  • Iteration limits: Improvements plateau after 5-7 generation runs

Troubleshooting

Common Issues

Optimization shows no improvements

  • Check evaluation set quality and size
  • Ensure field names clearly indicate expected content
  • Verify current descriptions aren’t already optimal
  • It’s possible the bottlenecks to increasing accuracy are not things that can be improved by the Composer agent. For instance:
    • If it’s a limitation of parsing the document (for instance illegible text, etc)
    • If the current chunking strategy is not enabling the extraction process to work properly

Accuracy decreases after optimization

  • Evaluation set may not be representative
  • Ground truth data may contain errors or inconsistencies that cause the agent to make incorrect assumptions
  • Consider using a different evaluation set

Optimization takes too long

  • Reduce max generation runs
  • Use a smaller evaluation set for initial optimization

Applied changes don’t improve production accuracy

  • Production documents may differ from evaluation set
  • Add more diverse documents to evaluation set
  • Re-run Composer with updated data