Back to Learning Hub
Core Concept

The Experiment Metric Framework

Every experiment needs three layers of metrics: a primary success measure, secondary context metrics, and guardrails to prevent unintended harm.

Primary Metric: your single measure of success

This is the one metric your experiment is designed to move. The ship-or-revert decision is based entirely on this metric. Having exactly one primary metric prevents the multiple comparisons problem, where testing many metrics simultaneously increases the chance of a false positive. Statistical significance is evaluated only on this metric.

Secondary Metrics: explain the "why" (1–3)

These metrics help you understand the mechanism behind your primary metric's movement. For example, if your primary metric is conversion rate, secondary metrics might include page views, time on page, or click-through rate. Pre-register them before launch to avoid cherry-picking. Selecting favorable metrics after seeing results is a form of p-hacking.

Guardrail Metrics: your safety net (at least 2)

These are "do no harm" metrics that must not degrade beyond acceptable thresholds. A guardrail breach can halt an experiment even if the primary metric improves, because a win on one metric isn't worth it if it causes damage elsewhere. Set explicit thresholds upfront (e.g., "churn must not increase by more than 0.5 percentage points").

Best practice
Common guardrail metrics in practice: app crash rate, customer support ticket volume, user churn rate, and page load time. Choose guardrails that reflect the most likely ways your change could cause unintended negative effects.

Beyond the theory

If you've got the theory down, see how it plays out in the simulator.

See the simulator