The Peeking Problem: Why Early Results Mislead
Repeatedly checking experiment results before the planned end date inflates your false positive rate from 5% to over 25%. Here's why, and what to do instead.
If you check results daily for 14 days at α = 0.05, your actual error rate becomes:
~25% false positive rate
That's 5× higher than the 5% threshold you intended
This happens because each "peek" is an independent statistical test. With 14 daily checks, you're effectively running 14 tests on accumulating data, each with a chance of producing a false positive. The probabilities compound, dramatically increasing the likelihood that random noise will appear statistically significant at some point during the experiment.
Valid approaches for early analysis
If your business requires the ability to make decisions before the full experiment duration, use one of these statistically sound methods:
Sequential Testing (Group Sequential Methods)
Uses alpha spending functions (such as O'Brien-Fleming or Pocock boundaries) to distribute your total α budget across pre-planned interim analyses. This controls the overall false positive rate while allowing you to stop early if the effect is very large.
Bayesian Experimentation
Instead of p-values, monitor the posterior probability that the treatment effect exceeds your MDE. Bayesian methods naturally handle multiple looks at the data, but require you to specify prior beliefs (priors) about the expected effect size before the experiment begins.
Beyond the theory
If you've got the theory down, see how it plays out in the simulator.
See the simulator