Prompt templates minimize boilerplate tokens

ab-000317 · ai-token-optimization.token-efficiency.prompt-template-efficiency

Severity: mediumactive

Why it matters

Every word in a developer-authored prompt template is processed on every request. Verbose preambles, redundant restatements of the same constraint, and politeness markers like "Please would you kindly" add tokens that produce no improvement in output quality. At 10,000 requests per day, 100 unnecessary tokens per request costs roughly $0.25/day on gpt-4o-mini and $2.50/day on gpt-4o — purely for filler text. OWASP LLM10 flags prompt bloat as a performance-efficiency failure. Compact prompts also leave more room for actual conversation content before hitting truncation.

Severity rationale

Medium because verbose prompt templates impose a fixed per-request token tax that scales directly with traffic volume.

Remediation

Audit each prompt template in src/lib/prompts/ or inline route handlers. Remove conversational filler, politeness markers, and redundant constraint restatements. Confirm output quality is equivalent before committing.

// Before — verbose with filler
const prompt = `You are an extremely helpful AI assistant who always provides accurate answers.
Please would you kindly summarize the following article clearly and concisely, capturing
the main points for the reader. Here is the article: ${article}`;

// After — direct and equivalent in output quality
const prompt = `Summarize the key points of this article:\n\n${article}`;

Measure the difference with countMessageTokens. Test both versions on 5 representative inputs to confirm output quality is equivalent before shipping the leaner version.

Detection

ID: prompt-template-efficiency
Severity: medium
What to look for: Read the prompt template strings in src/lib/prompts/, prompts.ts, or inline in route handlers. Look for verbose patterns: lengthy preambles like "You are an expert AI assistant who is very helpful and knowledgeable...", politeness markers like "Please would you kindly...", "I would like you to...", excessive few-shot examples that could be reduced, and repetitive instruction blocks that repeat the same constraint multiple times. Before evaluating, extract and quote a representative prompt template to evaluate whether it minimizes redundant instructions. Count all instances found and enumerate each.
Pass criteria: Prompt templates are direct and minimal. Instructions state the task clearly without conversational filler. When few-shot examples are used, only the minimum necessary count is included (typically 1-3 examples, not 10+). At least 1 implementation must be confirmed.
Fail criteria: Prompt templates contain significant conversational filler, redundant constraints stated multiple ways, or unnecessary politeness markers that add tokens without improving output quality.
Skip (N/A) when: All prompts are dynamically generated from user input and no developer-authored prompt templates exist. Signal: No constant strings assigned to system prompts or prompt templates in src/lib/ or route handlers.
Cross-reference: The system-prompt-token-count check measures the raw token cost that efficient templates help minimize.
Detail on fail: "Prompt templates contain boilerplate filler — tokens wasted on every request"

Remediation: Every word in a system or user prompt template is processed on every request. Over 10,000 requests per day, even 50 extra tokens per request adds up to meaningful cost.

// Before — verbose with filler
const prompt = `You are an extremely helpful and knowledgeable AI assistant who always
provides accurate, well-researched answers. Please would you kindly summarize the
following article in a clear and concise manner, making sure to capture the main
points and key takeaways for the reader. Here is the article: ${article}`;

// After — direct and equivalent
const prompt = `Summarize the key points of this article:\n\n${article}`;

Test both versions on several inputs to confirm output quality is equivalent before committing to the shorter version. Measure the token difference with countMessageTokens.

For deeper guidance on response quality and prompt design, see the AI Response Quality Audit.

External references

owasp-llm:2025 · LLM10 — Unbounded Consumption
iso-25010:2011 · performance-efficiency.resource-utilization — Resource Utilization — prompt template token efficiency

Taxons

cost-efficiency inference-contract

History

2026-04-18·v1.0.0·Initial import from ai-token-optimization·automated