Primary Metric
Conversion Rate
โ€”
Revenue Per Visitor
Revenue per Visitor
โ€”
Products Per Visitor
Products per Visitor
โ€”
Test Parameters
%
10%
10 % MDE = detect a relative change of ยฑ10 % from the baseline rate.
100%
Quality Checks
Sample Ratio Mismatch (SRM) Detection
Bonferroni Correction (multiple comparisons)
Sample Size Results
๐Ÿ“Š

Your results will appear here

Fill in the parameters and click Calculate Sample Size to see
required visitors, test duration, and more.

Primary Metric
Conversion Rate
โ€”
Revenue Per Visitor
Revenue per Visitor
โ€”
Products Per Visitor
Products per Visitor
โ€”
Test Data
Track Progress vs. Planned Sample Size
Analysis Settings
Sample Ratio Mismatch (SRM) Detection
Bonferroni Correction (multiple variants)
Analysis Results
๐Ÿ”ฌ

Your results will appear here

Enter variant data on the left and click Run Analysis to see p-values, confidence intervals, significance, and more.

Statistical methodology, demystified

Three engines behind smarter experimentation. Pick the one that matches how your team decides to ship.

๐Ÿ“Š

Frequentist

The classical approach. You set a significance level (ฮฑ) and statistical power (1 โˆ’ ฮฒ) before the test, run until the required sample size is reached, then check whether p < ฮฑ. If yes, the result is statistically significant. Gives a clear yes/no decision with known error rate guarantees.

Best for: Teams with fixed-duration tests, regulated environments, or those who need an unambiguous decision rule with pre-specified error rates.
๐Ÿ“ˆ

Sequential

Monitor continuously and stop early when sufficient evidence accumulates โ€” while still controlling false positive rates. Uses always-valid p-values or the Sequential Probability Ratio Test (SPRT). You trade a larger sample size (~1.5โ€“2ร—) for the freedom to stop a winning variant before the planned end date.

Best for: Teams that peek at results frequently, need to stop tests early, or run iterative experimentation at pace.
๐ŸŽฒ

Bayesian

Instead of a p-value, you get an intuitive probability: “There is an 87% chance variant B beats A.” Incorporates prior beliefs (or uses a flat/uninformative prior). No fixed sample size required โ€” stop when the probability threshold you care about is reached. Results are directly interpretable.

Best for: Teams comfortable with probabilistic outputs, multi-armed bandit setups, or continuous-deployment experimentation.

A/B testing answers on the same page as the math

Use these notes to understand sample size, MDE, p-values, RPV, APC, and products per visitor before starting โ€” or when reading a result.

Sample size & MDE

Sample size depends on four inputs: your baseline rate, the minimum effect you care about (MDE), confidence level (1 โˆ’ ฮฑ), and power (1 โˆ’ ฮฒ). Halving the MDE roughly quadruples the required sample. Always set these before running โ€” peeking and stopping early inflates the false positive rate in standard frequentist tests.

P-values & confidence

A p-value is the probability of observing data at least as extreme as yours if the null hypothesis is true. p < 0.05 does not mean a 95% chance your variant is better โ€” it means the result would occur less than 5% of the time by chance alone. A 95% confidence interval contains the true effect in 95% of repeated experiments, not with 95% probability for this single test.

Power & error control

Statistical power (1 โˆ’ ฮฒ) is the probability of detecting a real effect when one exists. At 80% power you’ll miss a true effect 20% of the time (Type II / false negative). ฮฑ controls Type I error (false positives). Higher power and lower ฮฑ both require larger samples. Industry default: ฮฑ = 0.05, power = 80%.

Sample Ratio Mismatch

SRM occurs when the observed traffic split differs from the intended allocation โ€” e.g. planning 50/50 but seeing 48/52. SRM invalidates test results even if p-values appear significant. Always check with a chi-square goodness-of-fit test. Common causes: bot traffic, browser redirects, sticky cookies, CDN caching, or experiment SDK misconfiguration.

Conversion rate

CR = Conversions รท Visitors. Use for discrete actions: purchases, sign-ups, clicks. Relative lift = (CR_B โˆ’ CR_A) / CR_A. Bayesian analysis treats conversions as draws from a Beta-Bernoulli model. Frequentist: use a two-proportion z-test for large samples. For fewer than 30 conversions per variant, consider Fisher’s exact test instead.

RPV, APC & products

Revenue Per Visitor (RPV) = Total Revenue รท Visitors. Average Products per Conversion (APC) = Total Units รท Conversions. These continuous metrics use Welch’s t-test rather than a z-test and require a standard deviation estimate to size the test. Revenue data is typically right-skewed โ€” consider a log-transform or non-parametric test if outliers are severe.

All calculations are performed client-side in your browser. No data is sent to any server.  |  Frequentist: two-proportion z-test & Welch’s t-test  |  Bayesian: Beta-Bernoulli Monte Carlo