Statistical Power in A/B Testing: How to Detect Real Effects

Power measures your experiment's ability to detect a real effect when one truly exists. Low power means you risk missing genuine improvements.

Statistical power (1 - β, where β is the Type II error rate) is the probability that your experiment will correctly detect a real effect when one exists. A Type II error (false negative) occurs when a genuine improvement exists but your experiment fails to detect it, meaning you might abandon a winning idea.

Power = 80% (industry standard)

If a real effect of your specified size exists, you have an 80% chance of detecting it

Common pitfall

Never run experiments with power below 80%. You'd have a greater than 1-in-5 chance of missing a real improvement. If your available traffic is too low, extend the experiment duration or narrow the target audience to increase the per-user sample. Do not accept low power as a tradeoff.

Best practice

For high-impact, strategic experiments (major product redesigns, pricing changes), target power = 90% to minimize the risk of missing a meaningful effect.

Statistical Power (1 - β): Detecting Real Effects