Composer
Composer is an AI assistant that helps you improve your configurations in Extend: extraction schemas, splitters, etc.
The Composer agent is currently in beta and is not available to all users. If you would like to be added to the beta, please contact us at support@extend.ai or via Slack if you have a shared channel with us.
Overview
Right now, Composer is only available for extraction schemas and can be run as a background agent to optimize your schema based on what it learns from your verified evaluation sets.
Key benefits:
- Automated improvement: Let AI analyze and optimize your extraction rules
- Data-driven optimization: Uses real evaluation data to guide improvements
- Measurable results: See exact accuracy improvements for each field and what changes were made before applying updates
- Time savings: Reduce hours of manual configuration tuning to minutes
How It Works
The Composer agent has access to the following tools to optimize your schema:
- Analyze: Examines your current extraction configuration and evaluates it against your chosen evaluation set(s)
- Read: Can view individual files and outputs in the evaluation set to understand what’s going on
- Generate: Creates multiple candidate improvements for field descriptions and extraction rules
- Evaluate: Tests each candidate configuration against the evaluation set
The background agent runs these processes in a loop (limited via “Max generation runs”) to find the best possible improvements.
Prerequisites
Before running the Composer, ensure you have:
The quality of optimization results directly depends on the quality of your evaluation sets. Poor or unrepresentative ground truth data will lead to suboptimal or incorrect optimizations.
1. High-Quality Evaluation Sets
Your evaluation sets must have:
- Representative samples: Include diverse examples that reflect real-world documents
- Accurate ground truth: Manually verified correct values for all fields
- Sufficient coverage: At least 20-30+ documents for reliable optimization
- Recent data: Evaluation sets should reflect current document formats and content
2. Stable Schema Structure
Before optimization:
- Finalize your schema structure (field names, types, nesting)
- Use clear, unambiguous field names that accurately describe the task
- Avoid vague or generic field names like “value1” or “data”
3. Clear Extraction Rules
Ensure your initial extraction rules:
- Clearly define what should be extracted for each field
- Include any specific formatting or validation requirements
- Specify how to handle edge cases or missing data
Configuring Composer
Before using Composer, Publish your processor to create a new version. This will allow you to revert changes if apply the updates, but the optimizations don’t perform as expected in production.
To run the background agent:
- Navigate to your processor’s Performance tab
- Click the Optimize button
- Configure the optimization parameters
Evaluation Set
Select the evaluation set to use for optimization. The agent will test improvements against this set to measure accuracy gains.
Choose an evaluation set that best represents your typical documents. Using a non-representative set may lead to optimizations that work well for the test data but poorly in production.
Processor Version
Select which version of your processor to optimize. Typically, you’ll optimize your draft version before publishing.
Max Generation Runs
Controls how many generation runs the agent will perform (default: 3). The higher the number, the longer it will take to complete, and the more credits it will cost, however allowing for more generations will almost always lead to better results.
Improvement Threshold (%)
The minimum accuracy improvement required to consider a change worthwhile (default: 5%).
- Changes below this threshold won’t be suggested
- Set higher to reduce noise and only show changes that are guaranteed to be impactful
- Set lower to see all potential improvements
Email Notification
Enable to receive an email when the optimization run completes. Recommended for longer runs.
Running the Optimization
- After configuring parameters, click Optimize
- The optimization will run in the background
- Monitor progress in the optimization runs table
- Once complete, review the proposed changes
Reviewing Results
When optimization completes, you’ll see a detailed view of proposed changes:
Understanding the Results
The results view shows:
- Field-by-field improvements: Each optimized field with its accuracy gain
- Original vs. Updated descriptions: Side-by-side comparison of changes
- Impact metrics: Percentage accuracy improvement for each field
- JSON diff view: Technical view of exact configuration changes
Interpreting Accuracy Improvements
- Green percentages (e.g., +17.5%): Significant improvements worth applying
- Higher improvements (>20%): Major gains, often from clarifying ambiguous descriptions
- Modest improvements (5-10%): Incremental gains that add up across many fields
Applying Changes
After reviewing the proposed optimizations:
- Review each change carefully to ensure it aligns with your extraction requirements
- Check the updated descriptions for accuracy and clarity
- Click Apply Updates to Draft to update your configuration
- Test the updated configuration with additional documents
- Monitor performance after deploying to production
You can always revert changes by using the version history if the optimizations don’t perform as expected in production.
Limitations
Current limitations of the Composer agent:
Technical Limitations
- Schema structure: Cannot modify field types or schema structure. Does not currently handle deeply nested (> 2 levels of nesting) fields well.
- Advanced options: Cannot yet modify advanced options in your processor (e.g. chunking options, etc)
Performance Considerations
- Large evaluation sets: Optimization time increases with evaluation set size
- Complex schemas: Deeply nested schemas may take longer to optimize
- Iteration limits: Improvements plateau after 5-7 generation runs
Troubleshooting
Common Issues
Optimization shows no improvements
- Check evaluation set quality and size
- Ensure field names clearly indicate expected content
- Verify current descriptions aren’t already optimal
- It’s possible the bottlenecks to increasing accuracy are not things that can be improved by the Composer agent. For instance:
- If it’s a limitation of parsing the document (for instance illegible text, etc)
- If the current chunking strategy is not enabling the extraction process to work properly
Accuracy decreases after optimization
- Evaluation set may not be representative
- Ground truth data may contain errors or inconsistencies that cause the agent to make incorrect assumptions
- Consider using a different evaluation set
Optimization takes too long
- Reduce max generation runs
- Use a smaller evaluation set for initial optimization
Applied changes don’t improve production accuracy
- Production documents may differ from evaluation set
- Add more diverse documents to evaluation set
- Re-run Composer with updated data