Statistical significance verified before declaring a winner

ab-000586 · campaign-analytics-attribution.ab-testing.significance-check-before-winner

Severity: highactive

Why it matters

Declaring a winner by comparing raw open or click rates without computing a p-value is the statistical equivalent of calling a coin biased after three flips. Even with reasonable sample sizes, peeking at running totals and picking the leader inflates your false positive rate dramatically — a standard 95% confidence threshold applied to a peeked result may provide only 70% actual confidence. You ship the wrong variant and attribute revenue lift to a change that was noise. In iso-25010:2011 terms this is a functional-suitability gap: the experiment infrastructure appears to work but cannot fulfill its core promise of reliable winner selection.

Severity rationale

High because selecting winners without significance testing produces false positives at rates far above the stated confidence level, systematically directing future campaigns toward variants that never actually outperformed.

Remediation

Gate winner selection behind a two-proportion z-test. Throw if the result is not significant rather than returning a soft warning:

function checkSignificance(
  controlConversions: number, controlTotal: number,
  variantConversions: number, variantTotal: number,
  alpha = 0.05
): { pValue: number; isSignificant: boolean; winner: 'control' | 'variant' | 'none' } {
  const p1 = controlConversions / controlTotal
  const p2 = variantConversions / variantTotal
  const pPool = (controlConversions + variantConversions) / (controlTotal + variantTotal)
  const se = Math.sqrt(pPool * (1 - pPool) * (1 / controlTotal + 1 / variantTotal))
  const z = Math.abs(p2 - p1) / se
  const pValue = 2 * (1 - normalCDF(z)) // supply a normalCDF implementation
  const isSignificant = pValue < alpha
  return { pValue, isSignificant, winner: !isSignificant ? 'none' : p2 > p1 ? 'variant' : 'control' }
}

async function declareWinner(experimentId: string, selectedVariant: string) {
  const stats = await getExperimentStats(experimentId)
  const result = checkSignificance(
    stats.control.conversions, stats.control.total,
    stats.variant.conversions, stats.variant.total
  )
  if (!result.isSignificant) {
    throw new Error(`p=${result.pValue.toFixed(4)} — not significant at alpha=0.05`)
  }
  await db.experiments.update({ where: { id: experimentId }, data: { winner: selectedVariant, concluded_at: new Date() } })
}

Detection

ID: significance-check-before-winner
Severity: high
What to look for: Examine how experiment winners are declared. Look for code that computes a p-value or confidence interval before marking a variant as the winner. This could be: a significance test function (chi-squared, z-test for proportions), an integration with a stats library, a flag on the experiment record that requires significance to be checked before winner selection, or guardrails in the UI that gate the "pick winner" action. The anti-pattern is declaring a winner based on raw metric comparisons (variant A has 27% open rate vs variant B's 24%) without computing statistical significance.
Pass criteria: Winner selection is gated by a significance test. A p-value or confidence interval is computed. The winner is not selectable until the required sample size is reached and significance is confirmed (p < 0.05 or equivalent threshold). Count every code path that declares a winner and verify each one includes a significance check — report the ratio even on pass.
Fail criteria: Winners are selected based on raw metric comparisons with no significance calculation. No p-value or confidence interval computed anywhere in experiment evaluation. The experiment can be declared complete at any time regardless of sample size.
Skip (N/A) when: The project does not run A/B tests.
Detail on fail: Example: "selectWinner() sets winning variant by comparing raw open rate numbers — no statistical test performed" or "No significance calculation found in experiment evaluation logic — peeking risk not mitigated"

Remediation: Compute statistical significance before allowing winner selection:

interface SignificanceResult {
  pValue: number
  isSignificant: boolean
  confidenceLevel: number
  winner: 'control' | 'variant' | 'no-winner'
}

// Two-proportion z-test
function checkSignificance(
  controlConversions: number,
  controlTotal: number,
  variantConversions: number,
  variantTotal: number,
  alpha: number = 0.05
): SignificanceResult {
  const p1 = controlConversions / controlTotal
  const p2 = variantConversions / variantTotal
  const pPool = (controlConversions + variantConversions) / (controlTotal + variantTotal)
  const se = Math.sqrt(pPool * (1 - pPool) * (1 / controlTotal + 1 / variantTotal))
  const z = Math.abs(p2 - p1) / se
  // Approximate two-tailed p-value from z-score
  const pValue = 2 * (1 - normalCDF(Math.abs(z)))
  const isSignificant = pValue < alpha

  return {
    pValue,
    isSignificant,
    confidenceLevel: 1 - alpha,
    winner: !isSignificant ? 'no-winner' : p2 > p1 ? 'variant' : 'control'
  }
}

// Gate winner selection
async function declareWinner(experimentId: string, selectedVariant: string) {
  const stats = await getExperimentStats(experimentId)
  const result = checkSignificance(
    stats.control.conversions, stats.control.total,
    stats.variant.conversions, stats.variant.total
  )
  if (!result.isSignificant) {
    throw new Error(`Cannot declare winner: p=${result.pValue.toFixed(4)}, not significant at alpha=0.05`)
  }
  await db.experiments.update({ where: { id: experimentId }, data: { winner: selectedVariant, concluded_at: new Date() } })
}

External references

iso-25010:2011 · functional-suitability — Functional suitability — functional correctness

Taxons

data-integrity

History

2026-04-18·v1.0.0·Initial import from campaign-analytics-attribution·automated

Why it matters

Remediation

Gate winner selection behind a two-proportion z-test. Throw if the result is not significant rather than returning a soft warning:

function checkSignificance(
  controlConversions: number, controlTotal: number,
  variantConversions: number, variantTotal: number,
  alpha = 0.05
): { pValue: number; isSignificant: boolean; winner: 'control' | 'variant' | 'none' } {
  const p1 = controlConversions / controlTotal
  const p2 = variantConversions / variantTotal
  const pPool = (controlConversions + variantConversions) / (controlTotal + variantTotal)
  const se = Math.sqrt(pPool * (1 - pPool) * (1 / controlTotal + 1 / variantTotal))
  const z = Math.abs(p2 - p1) / se
  const pValue = 2 * (1 - normalCDF(z)) // supply a normalCDF implementation
  const isSignificant = pValue < alpha
  return { pValue, isSignificant, winner: !isSignificant ? 'none' : p2 > p1 ? 'variant' : 'control' }
}

async function declareWinner(experimentId: string, selectedVariant: string) {
  const stats = await getExperimentStats(experimentId)
  const result = checkSignificance(
    stats.control.conversions, stats.control.total,
    stats.variant.conversions, stats.variant.total
  )
  if (!result.isSignificant) {
    throw new Error(`p=${result.pValue.toFixed(4)} — not significant at alpha=0.05`)
  }
  await db.experiments.update({ where: { id: experimentId }, data: { winner: selectedVariant, concluded_at: new Date() } })
}

Detection

ID: significance-check-before-winner
Severity: high
What to look for: Examine how experiment winners are declared. Look for code that computes a p-value or confidence interval before marking a variant as the winner. This could be: a significance test function (chi-squared, z-test for proportions), an integration with a stats library, a flag on the experiment record that requires significance to be checked before winner selection, or guardrails in the UI that gate the "pick winner" action. The anti-pattern is declaring a winner based on raw metric comparisons (variant A has 27% open rate vs variant B's 24%) without computing statistical significance.
Pass criteria: Winner selection is gated by a significance test. A p-value or confidence interval is computed. The winner is not selectable until the required sample size is reached and significance is confirmed (p < 0.05 or equivalent threshold). Count every code path that declares a winner and verify each one includes a significance check — report the ratio even on pass.
Fail criteria: Winners are selected based on raw metric comparisons with no significance calculation. No p-value or confidence interval computed anywhere in experiment evaluation. The experiment can be declared complete at any time regardless of sample size.
Skip (N/A) when: The project does not run A/B tests.
Detail on fail: Example: "selectWinner() sets winning variant by comparing raw open rate numbers — no statistical test performed" or "No significance calculation found in experiment evaluation logic — peeking risk not mitigated"

Remediation: Compute statistical significance before allowing winner selection:

interface SignificanceResult {
  pValue: number
  isSignificant: boolean
  confidenceLevel: number
  winner: 'control' | 'variant' | 'no-winner'
}

// Two-proportion z-test
function checkSignificance(
  controlConversions: number,
  controlTotal: number,
  variantConversions: number,
  variantTotal: number,
  alpha: number = 0.05
): SignificanceResult {
  const p1 = controlConversions / controlTotal
  const p2 = variantConversions / variantTotal
  const pPool = (controlConversions + variantConversions) / (controlTotal + variantTotal)
  const se = Math.sqrt(pPool * (1 - pPool) * (1 / controlTotal + 1 / variantTotal))
  const z = Math.abs(p2 - p1) / se
  // Approximate two-tailed p-value from z-score
  const pValue = 2 * (1 - normalCDF(Math.abs(z)))
  const isSignificant = pValue < alpha

  return {
    pValue,
    isSignificant,
    confidenceLevel: 1 - alpha,
    winner: !isSignificant ? 'no-winner' : p2 > p1 ? 'variant' : 'control'
  }
}

// Gate winner selection
async function declareWinner(experimentId: string, selectedVariant: string) {
  const stats = await getExperimentStats(experimentId)
  const result = checkSignificance(
    stats.control.conversions, stats.control.total,
    stats.variant.conversions, stats.variant.total
  )
  if (!result.isSignificant) {
    throw new Error(`Cannot declare winner: p=${result.pValue.toFixed(4)}, not significant at alpha=0.05`)
  }
  await db.experiments.update({ where: { id: experimentId }, data: { winner: selectedVariant, concluded_at: new Date() } })
}