Sample size calculated before experiment launch

ab-000585 · campaign-analytics-attribution.ab-testing.sample-size-calculation

Severity: highactive

Why it matters

Launching an A/B test without calculating required sample size means you have no principled stopping rule. Teams peeking at early results and declaring a winner at 200 contacts when 800 were needed inflate false positive rates from 5% to 25% or higher — a phenomenon called peeking bias. You ship subject lines, CTAs, or send times based on noise rather than signal. In iso-25010:2011 terms this is a functional-suitability failure: the experiment tooling exists but cannot reliably distinguish a real effect from random variance. The Campaign Orchestration & Sequencing Audit covers how these flawed signals propagate into branching decisions.

Severity rationale

High because experiments evaluated without a pre-calculated sample size routinely produce false positives, causing teams to deploy losing variants with statistical confidence that the system never actually earned.

Remediation

Implement a sample size calculator and require the field to be set before experiment launch. Block the launch action at the API layer until required_sample_size is populated:

function requiredSampleSize(
  baselineRate: number,       // e.g. 0.25 for 25% open rate
  minimumDetectableEffect: number, // e.g. 0.05 for +5pp lift
  alpha = 0.05,
  power = 0.80
): number {
  const p1 = baselineRate
  const p2 = baselineRate + minimumDetectableEffect
  const pBar = (p1 + p2) / 2
  const zAlpha = 1.96  // two-tailed, alpha=0.05
  const zBeta  = 0.842 // power=0.80
  const n = Math.pow(
    zAlpha * Math.sqrt(2 * pBar * (1 - pBar)) +
    zBeta  * Math.sqrt(p1 * (1 - p1) + p2 * (1 - p2)),
    2
  ) / Math.pow(p2 - p1, 2)
  return Math.ceil(n)
}

// Example: 25% baseline, detect +5pp lift → 768 contacts per variant
const n = requiredSampleSize(0.25, 0.05)

Store the result on the experiment record and add a pre-launch validation that rejects experiments where required_sample_size is null or zero.

Detection

ID: sample-size-calculation
Severity: high
What to look for: Look for evidence that required sample size is computed before experiments are launched. This could appear as: a utility function that calculates required sample size given a baseline rate, minimum detectable effect, significance level, and power; documentation or configuration fields on experiment records that include required_sample_size or minimum_detectable_effect; or a pre-launch checklist or validation that requires sample size to be set. The absence of any sample size field or pre-launch calculation logic is a failure.
Pass criteria: Experiment records or configuration include a pre-calculated required sample size. A utility or formula for computing it is present that accepts at least 3 inputs (baseline rate, minimum detectable effect, and significance level). Or an A/B testing library that handles power analysis internally (e.g., GrowthBook, Statsig) is configured with explicit minimum detectable effect settings. Count all experiment configurations and verify each includes a sample size field.
Fail criteria: No sample size calculation found anywhere. Experiments are launched and winners declared based on whichever variant is ahead at an arbitrary time. No minimum detectable effect or statistical power settings present.
Skip (N/A) when: The project does not run A/B tests.
Cross-reference: The Campaign Orchestration & Sequencing Audit evaluates whether sequence branching decisions are data-driven, which depends on statistically valid experiment results.
Detail on fail: Example: "No sample size calculation logic found — experiments appear to be evaluated at arbitrary time points" or "Experiment configuration has no required_sample_size or minimum_detectable_effect fields"

Remediation: Calculate required sample size before launching any experiment:

// Simple two-proportion z-test sample size calculator
function requiredSampleSize(
  baselineRate: number,       // e.g., 0.25 for 25% open rate
  minimumDetectableEffect: number, // e.g., 0.05 for +5pp lift
  alpha: number = 0.05,       // significance level (Type I error)
  power: number = 0.80        // statistical power (1 - Type II error)
): number {
  const p1 = baselineRate
  const p2 = baselineRate + minimumDetectableEffect
  const pBar = (p1 + p2) / 2

  // z-scores for alpha/2 and beta
  const zAlpha = 1.96  // for alpha=0.05, two-tailed
  const zBeta = 0.842  // for power=0.80

  const numerator = Math.pow(zAlpha * Math.sqrt(2 * pBar * (1 - pBar)) + zBeta * Math.sqrt(p1 * (1 - p1) + p2 * (1 - p2)), 2)
  const denominator = Math.pow(p2 - p1, 2)

  return Math.ceil(numerator / denominator)
}

// Example: detect +5pp lift on 25% open rate baseline
const n = requiredSampleSize(0.25, 0.05) // ~768 contacts per variant

Store this on the experiment record and block launch until it's set.

External references

iso-25010:2011 · functional-suitability — Functional suitability — functional correctness

Taxons

data-integrity

History

2026-04-18·v1.0.0·Initial import from campaign-analytics-attribution·automated