The Experiment Metric Framework
Every experiment needs three layers of metrics: a primary success measure, secondary context metrics, and guardrails to prevent unintended harm.
Primary Metric: your single measure of success
This is the one metric your experiment is designed to move. The ship-or-revert decision is based entirely on this metric. Having exactly one primary metric prevents the multiple comparisons problem, where testing many metrics simultaneously increases the chance of a false positive. Statistical significance is evaluated only on this metric.
Secondary Metrics: explain the "why" (1–3)
These metrics help you understand the mechanism behind your primary metric's movement. For example, if your primary metric is conversion rate, secondary metrics might include page views, time on page, or click-through rate. Pre-register them before launch to avoid cherry-picking. Selecting favorable metrics after seeing results is a form of p-hacking.
Guardrail Metrics: your safety net (at least 2)
These are "do no harm" metrics that must not degrade beyond acceptable thresholds. A guardrail breach can halt an experiment even if the primary metric improves, because a win on one metric isn't worth it if it causes damage elsewhere. Set explicit thresholds upfront (e.g., "churn must not increase by more than 0.5 percentage points").
Beyond the theory
If you've got the theory down, see how it plays out in the simulator.
See the simulator