Sample size calculated before experiment launch
Why it matters
Launching an A/B test without calculating required sample size means you have no principled stopping rule. Teams peeking at early results and declaring a winner at 200 contacts when 800 were needed inflate false positive rates from 5% to 25% or higher — a phenomenon called peeking bias. You ship subject lines, CTAs, or send times based on noise rather than signal. In iso-25010:2011 terms this is a functional-suitability failure: the experiment tooling exists but cannot reliably distinguish a real effect from random variance. The Campaign Orchestration & Sequencing Audit covers how these flawed signals propagate into branching decisions.
Severity rationale
High because experiments evaluated without a pre-calculated sample size routinely produce false positives, causing teams to deploy losing variants with statistical confidence that the system never actually earned.
Remediation
Implement a sample size calculator and require the field to be set before experiment launch. Block the launch action at the API layer until required_sample_size is populated:
function requiredSampleSize(
baselineRate: number, // e.g. 0.25 for 25% open rate
minimumDetectableEffect: number, // e.g. 0.05 for +5pp lift
alpha = 0.05,
power = 0.80
): number {
const p1 = baselineRate
const p2 = baselineRate + minimumDetectableEffect
const pBar = (p1 + p2) / 2
const zAlpha = 1.96 // two-tailed, alpha=0.05
const zBeta = 0.842 // power=0.80
const n = Math.pow(
zAlpha * Math.sqrt(2 * pBar * (1 - pBar)) +
zBeta * Math.sqrt(p1 * (1 - p1) + p2 * (1 - p2)),
2
) / Math.pow(p2 - p1, 2)
return Math.ceil(n)
}
// Example: 25% baseline, detect +5pp lift → 768 contacts per variant
const n = requiredSampleSize(0.25, 0.05)
Store the result on the experiment record and add a pre-launch validation that rejects experiments where required_sample_size is null or zero.
Detection
-
ID:
sample-size-calculation -
Severity:
high -
What to look for: Look for evidence that required sample size is computed before experiments are launched. This could appear as: a utility function that calculates required sample size given a baseline rate, minimum detectable effect, significance level, and power; documentation or configuration fields on experiment records that include
required_sample_sizeorminimum_detectable_effect; or a pre-launch checklist or validation that requires sample size to be set. The absence of any sample size field or pre-launch calculation logic is a failure. -
Pass criteria: Experiment records or configuration include a pre-calculated required sample size. A utility or formula for computing it is present that accepts at least 3 inputs (baseline rate, minimum detectable effect, and significance level). Or an A/B testing library that handles power analysis internally (e.g., GrowthBook, Statsig) is configured with explicit minimum detectable effect settings. Count all experiment configurations and verify each includes a sample size field.
-
Fail criteria: No sample size calculation found anywhere. Experiments are launched and winners declared based on whichever variant is ahead at an arbitrary time. No minimum detectable effect or statistical power settings present.
-
Skip (N/A) when: The project does not run A/B tests.
-
Cross-reference: The Campaign Orchestration & Sequencing Audit evaluates whether sequence branching decisions are data-driven, which depends on statistically valid experiment results.
-
Detail on fail: Example:
"No sample size calculation logic found — experiments appear to be evaluated at arbitrary time points"or"Experiment configuration has no required_sample_size or minimum_detectable_effect fields" -
Remediation: Calculate required sample size before launching any experiment:
// Simple two-proportion z-test sample size calculator function requiredSampleSize( baselineRate: number, // e.g., 0.25 for 25% open rate minimumDetectableEffect: number, // e.g., 0.05 for +5pp lift alpha: number = 0.05, // significance level (Type I error) power: number = 0.80 // statistical power (1 - Type II error) ): number { const p1 = baselineRate const p2 = baselineRate + minimumDetectableEffect const pBar = (p1 + p2) / 2 // z-scores for alpha/2 and beta const zAlpha = 1.96 // for alpha=0.05, two-tailed const zBeta = 0.842 // for power=0.80 const numerator = Math.pow(zAlpha * Math.sqrt(2 * pBar * (1 - pBar)) + zBeta * Math.sqrt(p1 * (1 - p1) + p2 * (1 - p2)), 2) const denominator = Math.pow(p2 - p1, 2) return Math.ceil(numerator / denominator) } // Example: detect +5pp lift on 25% open rate baseline const n = requiredSampleSize(0.25, 0.05) // ~768 contacts per variantStore this on the experiment record and block launch until it's set.
External references
- iso-25010:2011 · functional-suitability — Functional suitability — functional correctness
Taxons
History
- 2026-04-18·v1.0.0·Initial import from campaign-analytics-attribution·automated