A/B Testing
Overview
A/B testing lets you compare two versions of your application side-by-side. For each test case, you choose which version performs better.
Setting up A/B testing
- Select two versions you want to compare
- Choose your test set
- For each test case, decide which version is better (or if they're equal)
A/B testing features
During A/B evaluation, you can:
- Compare variants - Score which version performs better for each test case
- Add notes - Include context or detailed feedback
- Export results - Download your evaluation data for further analysis
Collaborating on A/B tests
You can invite team members to help with A/B testing by sharing the evaluation link. Team members must be added to your workspace first.
This is particularly useful for:
- Getting diverse perspectives on performance
- Reducing individual bias
- Speeding up evaluation with multiple annotators
Interpreting A/B test results
After completing the A/B test, you'll see:
- Win/loss/tie counts for each variant
- Percentage of cases where each variant performed better
- Specific test cases where variants differed significantly
- Notes and comments from annotators
Use cases
A/B testing is ideal for:
- Prompt optimization: Compare different prompt wordings
- Model selection: Evaluate different LLM models (GPT-4 vs Claude vs others)
- Parameter tuning: Test different temperature or max_tokens settings
- Feature comparison: Compare variants with different features enabled
Next steps
- Learn about exporting results
- Explore automated evaluation for larger-scale comparisons
- Understand evaluation concepts