Declaring a winner by comparing raw open or click rates without computing a p-value is the statistical equivalent of calling a coin biased after three flips. Even with reasonable sample sizes, peeking at running totals and picking the leader inflates your false positive rate dramatically — a standard 95% confidence threshold applied to a peeked result may provide only 70% actual confidence. You ship the wrong variant and attribute revenue lift to a change that was noise. In iso-25010:2011 terms this is a functional-suitability gap: the experiment infrastructure appears to work but cannot fulfill its core promise of reliable winner selection.
High because selecting winners without significance testing produces false positives at rates far above the stated confidence level, systematically directing future campaigns toward variants that never actually outperformed.
Gate winner selection behind a two-proportion z-test. Throw if the result is not significant rather than returning a soft warning:
function checkSignificance(
controlConversions: number, controlTotal: number,
variantConversions: number, variantTotal: number,
alpha = 0.05
): { pValue: number; isSignificant: boolean; winner: 'control' | 'variant' | 'none' } {
const p1 = controlConversions / controlTotal
const p2 = variantConversions / variantTotal
const pPool = (controlConversions + variantConversions) / (controlTotal + variantTotal)
const se = Math.sqrt(pPool * (1 - pPool) * (1 / controlTotal + 1 / variantTotal))
const z = Math.abs(p2 - p1) / se
const pValue = 2 * (1 - normalCDF(z)) // supply a normalCDF implementation
const isSignificant = pValue < alpha
return { pValue, isSignificant, winner: !isSignificant ? 'none' : p2 > p1 ? 'variant' : 'control' }
}
async function declareWinner(experimentId: string, selectedVariant: string) {
const stats = await getExperimentStats(experimentId)
const result = checkSignificance(
stats.control.conversions, stats.control.total,
stats.variant.conversions, stats.variant.total
)
if (!result.isSignificant) {
throw new Error(`p=${result.pValue.toFixed(4)} — not significant at alpha=0.05`)
}
await db.experiments.update({ where: { id: experimentId }, data: { winner: selectedVariant, concluded_at: new Date() } })
}
ID: campaign-analytics-attribution.ab-testing.significance-check-before-winner
Severity: high
What to look for: Examine how experiment winners are declared. Look for code that computes a p-value or confidence interval before marking a variant as the winner. This could be: a significance test function (chi-squared, z-test for proportions), an integration with a stats library, a flag on the experiment record that requires significance to be checked before winner selection, or guardrails in the UI that gate the "pick winner" action. The anti-pattern is declaring a winner based on raw metric comparisons (variant A has 27% open rate vs variant B's 24%) without computing statistical significance.
Pass criteria: Winner selection is gated by a significance test. A p-value or confidence interval is computed. The winner is not selectable until the required sample size is reached and significance is confirmed (p < 0.05 or equivalent threshold). Count every code path that declares a winner and verify each one includes a significance check — report the ratio even on pass.
Fail criteria: Winners are selected based on raw metric comparisons with no significance calculation. No p-value or confidence interval computed anywhere in experiment evaluation. The experiment can be declared complete at any time regardless of sample size.
Skip (N/A) when: The project does not run A/B tests.
Detail on fail: Example: "selectWinner() sets winning variant by comparing raw open rate numbers — no statistical test performed" or "No significance calculation found in experiment evaluation logic — peeking risk not mitigated"
Remediation: Compute statistical significance before allowing winner selection:
interface SignificanceResult {
pValue: number
isSignificant: boolean
confidenceLevel: number
winner: 'control' | 'variant' | 'no-winner'
}
// Two-proportion z-test
function checkSignificance(
controlConversions: number,
controlTotal: number,
variantConversions: number,
variantTotal: number,
alpha: number = 0.05
): SignificanceResult {
const p1 = controlConversions / controlTotal
const p2 = variantConversions / variantTotal
const pPool = (controlConversions + variantConversions) / (controlTotal + variantTotal)
const se = Math.sqrt(pPool * (1 - pPool) * (1 / controlTotal + 1 / variantTotal))
const z = Math.abs(p2 - p1) / se
// Approximate two-tailed p-value from z-score
const pValue = 2 * (1 - normalCDF(Math.abs(z)))
const isSignificant = pValue < alpha
return {
pValue,
isSignificant,
confidenceLevel: 1 - alpha,
winner: !isSignificant ? 'no-winner' : p2 > p1 ? 'variant' : 'control'
}
}
// Gate winner selection
async function declareWinner(experimentId: string, selectedVariant: string) {
const stats = await getExperimentStats(experimentId)
const result = checkSignificance(
stats.control.conversions, stats.control.total,
stats.variant.conversions, stats.variant.total
)
if (!result.isSignificant) {
throw new Error(`Cannot declare winner: p=${result.pValue.toFixed(4)}, not significant at alpha=0.05`)
}
await db.experiments.update({ where: { id: experimentId }, data: { winner: selectedVariant, concluded_at: new Date() } })
}