Skip to main content

A/B Testing

Overview

A/B testing lets you compare two versions of your application side-by-side. For each test case, you choose which version performs better.

Setting up A/B testing

  1. Select two versions you want to compare
  2. Choose your test set
  3. For each test case, decide which version is better (or if they're equal)

A/B testing features

During A/B evaluation, you can:

  • Compare variants - Score which version performs better for each test case
  • Add notes - Include context or detailed feedback
  • Export results - Download your evaluation data for further analysis

Collaborating on A/B tests

You can invite team members to help with A/B testing by sharing the evaluation link. Team members must be added to your workspace first.

This is particularly useful for:

  • Getting diverse perspectives on performance
  • Reducing individual bias
  • Speeding up evaluation with multiple annotators

Interpreting A/B test results

After completing the A/B test, you'll see:

  • Win/loss/tie counts for each variant
  • Percentage of cases where each variant performed better
  • Specific test cases where variants differed significantly
  • Notes and comments from annotators

Use cases

A/B testing is ideal for:

  • Prompt optimization: Compare different prompt wordings
  • Model selection: Evaluate different LLM models (GPT-4 vs Claude vs others)
  • Parameter tuning: Test different temperature or max_tokens settings
  • Feature comparison: Compare variants with different features enabled

Next steps