Probability
A/B Test Trial Simulation
Guide
This simulator shows how real A/B tests progress over time, with each new participant adding noise and uncertainty.
Unlike static calculations, this demonstrates the actual trial experience where p-values start at 1 (no evidence) and gradually decrease as evidence accumulates.
Chart Interpretation
- X axis: Trials - Number of runs accumulated
- Left Y-axis: -Log₁₀(p-value) - lower means more significant
- Right Y-axis: Odds Ratio - values > 1 favor treatment
- Blue line: P-value evolution (starts at p=1, becomes significant when crossing threshold)
- Red line: Observed odds ratio (fluctuates due to variance)
- Gray dashed line: Standard significance threshold (p=0.05)
- Orange dotted line: Doubt-adjusted threshold (stricter early, standard later)
- P-value starts at 1 (no evidence)
- Gradual decrease as evidence accumulates
- High variance makes detection harder
- Odds ratio fluctuates around true value
- Doubt index discounts early significance due to luck factor
- Sustained evidence required for confirmation (EG: 5 consecutive trials)
A/B Test Statistical Significance Calculator
Control Group (A)
True conversion rate for control group
Standard deviation as % of base effect (multiplicative)
Treatment Group (B)
True conversion rate for treatment group
Standard deviation as % of target effect (multiplicative)
Doubt Adjustment
Conservative thresholds: p < 0.01 for first 10 trials, gradually relaxing to p < 0.05Monte Carlo Distribution: Trials to Significance
Key Statistics
Control (A): 20% ± 4.0%
Treatment (B): 30% ± 6.0%
True Improvement: 50.0%
Mean Trials: 145
Median Trials: 119
Range: 16 - 500
Simulations: 100
How to Read This Chart
Real-World Examples
- Control: 20% conversion ± 20% variance (16-24% range)
- Treatment: 30% conversion ± 20% variance (24-36% range)
- Result: May take 200-500 trials to detect significance
- Control: 20% open rate ± 10% variance (18-22% range)
- Treatment: 25% open rate ± 10% variance (22.5-27.5% range)
- Result: May take 100-300 trials to detect significance
- Control: 10% conversion ± 30% variance (7-13% range)
- Treatment: 20% conversion ± 30% variance (14-26% range)
- Result: May take 500-1000 trials to detect significance
This approach provides a realistic view of how A/B tests actually work in practice, helping you understand the uncertainty and variance inherent in real-world experimentation.