NelworksNelworks
Probability

A/B Test Trial Simulation

Guide

This simulator shows how real A/B tests progress over time, with each new participant adding noise and uncertainty.

Unlike static calculations, this demonstrates the actual trial experience where p-values start at 1 (no evidence) and gradually decrease as evidence accumulates.


Chart Interpretation

  • X axis: Trials - Number of runs accumulated
  • Left Y-axis: -Log₁₀(p-value) - lower means more significant
  • Right Y-axis: Odds Ratio - values > 1 favor treatment
  • Blue line: P-value evolution (starts at p=1, becomes significant when crossing threshold)
  • Red line: Observed odds ratio (fluctuates due to variance)
  • Gray dashed line: Standard significance threshold (p=0.05)
  • Orange dotted line: Doubt-adjusted threshold (stricter early, standard later)
  • P-value starts at 1 (no evidence)
  • Gradual decrease as evidence accumulates
  • High variance makes detection harder
  • Odds ratio fluctuates around true value
  • Doubt index discounts early significance due to luck factor
  • Sustained evidence required for confirmation (EG: 5 consecutive trials)

A/B Test Statistical Significance Calculator

Control Group (A)

True conversion rate for control group
Standard deviation as % of base effect (multiplicative)

Treatment Group (B)

True conversion rate for treatment group
Standard deviation as % of target effect (multiplicative)

Doubt Adjustment

Conservative thresholds: p < 0.01 for first 10 trials, gradually relaxing to p < 0.05

Monte Carlo Distribution: Trials to Significance

Key Statistics

Control (A): 20% ± 4.0%

Treatment (B): 30% ± 6.0%

True Improvement: 50.0%

Mean Trials: 145

Median Trials: 119

Range: 16 - 500

Simulations: 100

How to Read This Chart


Real-World Examples

  • Control: 20% conversion ± 20% variance (16-24% range)
  • Treatment: 30% conversion ± 20% variance (24-36% range)
  • Result: May take 200-500 trials to detect significance
  • Control: 20% open rate ± 10% variance (18-22% range)
  • Treatment: 25% open rate ± 10% variance (22.5-27.5% range)
  • Result: May take 100-300 trials to detect significance
  • Control: 10% conversion ± 30% variance (7-13% range)
  • Treatment: 20% conversion ± 30% variance (14-26% range)
  • Result: May take 500-1000 trials to detect significance

This approach provides a realistic view of how A/B tests actually work in practice, helping you understand the uncertainty and variance inherent in real-world experimentation.