AB Test Calculator

AB.Test

Primary Metric

Conversion Rate

—

Revenue Per Visitor

Revenue per Visitor

—

Products Per Visitor

Products per Visitor

—

Test Parameters

Baseline Conversion Rate

Minimum Detectable Effect (Relative %)

10%

10 % MDE = detect a relative change of ±10 % from the baseline rate.

Number of Variants (incl. Control)

Confidence Level

Statistical Power

Analysis Method

Daily Visitors to Site

Traffic Allocated to Test

100%

Quality Checks

Sample Ratio Mismatch (SRM) Detection

Bonferroni Correction (multiple comparisons)

Sample Size Results

📊

Your results will appear here

Fill in the parameters and click Calculate Sample Size to see
required visitors, test duration, and more.

Primary Metric

Conversion Rate

—

Revenue Per Visitor

Revenue per Visitor

—

Products Per Visitor

Products per Visitor

—

Test Data

Number of Variants (incl. Control)

Track Progress vs. Planned Sample Size

Analysis Settings

Analysis Method

Significance Level

Tail Type

Sample Ratio Mismatch (SRM) Detection

Bonferroni Correction (multiple variants)

Analysis Results

🔬

Your results will appear here

Enter variant data on the left and click Run Analysis to see p-values, confidence intervals, significance, and more.

Statistical methodology, demystified

Three engines behind smarter experimentation. Pick the one that matches how your team decides to ship.

📊

Frequentist

The classical approach. You set a significance level (α) and statistical power (1 − β) before the test, run until the required sample size is reached, then check whether p < α. If yes, the result is statistically significant. Gives a clear yes/no decision with known error rate guarantees.

Best for: Teams with fixed-duration tests, regulated environments, or those who need an unambiguous decision rule with pre-specified error rates.

📈

Sequential

Monitor continuously and stop early when sufficient evidence accumulates — while still controlling false positive rates. Uses always-valid p-values or the Sequential Probability Ratio Test (SPRT). You trade a larger sample size (~1.5–2×) for the freedom to stop a winning variant before the planned end date.

Best for: Teams that peek at results frequently, need to stop tests early, or run iterative experimentation at pace.

🎲

Bayesian

Instead of a p-value, you get an intuitive probability: “There is an 87% chance variant B beats A.” Incorporates prior beliefs (or uses a flat/uninformative prior). No fixed sample size required — stop when the probability threshold you care about is reached. Results are directly interpretable.

Best for: Teams comfortable with probabilistic outputs, multi-armed bandit setups, or continuous-deployment experimentation.

A/B testing answers on the same page as the math

Use these notes to understand sample size, MDE, p-values, RPV, APC, and products per visitor before starting — or when reading a result.

Sample size & MDE

Sample size depends on four inputs: your baseline rate, the minimum effect you care about (MDE), confidence level (1 − α), and power (1 − β). Halving the MDE roughly quadruples the required sample. Always set these before running — peeking and stopping early inflates the false positive rate in standard frequentist tests.

P-values & confidence

A p-value is the probability of observing data at least as extreme as yours if the null hypothesis is true. p < 0.05 does not mean a 95% chance your variant is better — it means the result would occur less than 5% of the time by chance alone. A 95% confidence interval contains the true effect in 95% of repeated experiments, not with 95% probability for this single test.

Power & error control

Statistical power (1 − β) is the probability of detecting a real effect when one exists. At 80% power you’ll miss a true effect 20% of the time (Type II / false negative). α controls Type I error (false positives). Higher power and lower α both require larger samples. Industry default: α = 0.05, power = 80%.

Sample Ratio Mismatch

SRM occurs when the observed traffic split differs from the intended allocation — e.g. planning 50/50 but seeing 48/52. SRM invalidates test results even if p-values appear significant. Always check with a chi-square goodness-of-fit test. Common causes: bot traffic, browser redirects, sticky cookies, CDN caching, or experiment SDK misconfiguration.

Conversion rate

CR = Conversions ÷ Visitors. Use for discrete actions: purchases, sign-ups, clicks. Relative lift = (CR_B − CR_A) / CR_A. Bayesian analysis treats conversions as draws from a Beta-Bernoulli model. Frequentist: use a two-proportion z-test for large samples. For fewer than 30 conversions per variant, consider Fisher’s exact test instead.

RPV, APC & products

Revenue Per Visitor (RPV) = Total Revenue ÷ Visitors. Average Products per Conversion (APC) = Total Units ÷ Conversions. These continuous metrics use Welch’s t-test rather than a z-test and require a standard deviation estimate to size the test. Revenue data is typically right-skewed — consider a log-transform or non-parametric test if outliers are severe.

All calculations are performed client-side in your browser. No data is sent to any server. | Frequentist: two-proportion z-test & Welch’s t-test | Bayesian: Beta-Bernoulli Monte Carlo

Confidence Level	95%
Statistical Power	80%
α (Type I Error)	0.05
β (Type II Error)	0.20
Analysis Method	Frequentist
Bonferroni adj. α	—