Suspicious prompt inputs are logged for monitoring

ab-000194 · ai-prompt-injection.architecture-defense.suspicious-input-logging

Severity: mediumactive

Why it matters

Without logging suspicious prompt inputs, injection attempts are invisible—you have no way to detect patterns, identify targeted attacks, or tune your defenses based on real traffic. CWE-778 (Insufficient Logging) and NIST AI RMF MEASURE 2.6 both require that AI systems generate the evidence needed to assess their own security posture. NIST SP 800-53 AU-12 mandates audit logging for security-relevant events. Coordinated injection campaigns that probe hundreds of variations per day will go completely undetected without event logging, letting attackers iterate at low risk. Conversely, logging the full content of every message creates a GDPR and CCPA data minimization problem—the goal is targeted logging of anomalous events, not bulk prompt storage.

Severity rationale

Medium because absent suspicious-input logging allows adversaries to probe injection vectors indefinitely without generating any signal, making coordinated attacks undetectable until after a successful breach.

Remediation

Log security events (not message content) when suspicious patterns are detected. The log entry needs enough context for investigation without storing the actual prompt text.

// src/lib/ai-security-logger.ts
export async function logSuspiciousInput(params: {
  sessionId: string        // not the user's name or email
  detectionType: 'jailbreak_pattern' | 'moderation_flagged' | 'length_exceeded'
  patternMatched?: string  // which regex fired, not the input itself
  messageLength: number
  endpoint: string
}) {
  await securityLogger.warn({
    event: 'suspicious_prompt_input',
    ...params,
    timestamp: new Date().toISOString()
  })
}

// Usage in your injection detection layer:
if (hasInjectionPattern(userMessage)) {
  await logSuspiciousInput({
    sessionId: session.id,
    detectionType: 'jailbreak_pattern',
    patternMatched: matchedPattern.source,
    messageLength: userMessage.length,
    endpoint: '/api/chat'
  })
  return Response.json({ error: 'Message not allowed' }, { status: 400 })
}

Route security events to a separate log sink from application logs so they can be monitored independently and retained on a different schedule.

Detection

ID: suspicious-input-logging
Severity: medium
What to look for: Count all LLM input processing paths. For each, look for logging of inputs that trigger jailbreak pattern detection, fail content moderation, or are otherwise flagged as suspicious. Check that suspicious events are logged with enough context for investigation (timestamp, user/session ID, flagged reason) but WITHOUT logging the full prompt content of non-suspicious requests or any PII. Look for a structured logging approach that separates security event logs from general application logs.
Pass criteria: Inputs that fail jailbreak detection or content moderation are logged as security events with: timestamp, session/user identifier (not PII), detection type, and a flag indicating the type of suspicious content — but not the full text of every message. Normal (non-suspicious) prompts are not fully logged — at least 1 logging mechanism captures injection attempts without logging full prompt contents. Report: "X LLM input paths found, Y log suspicious patterns."
Fail criteria: No logging of suspicious inputs exists, making it impossible to detect patterns of attempted injection. OR all prompts are logged in full (privacy risk), OR suspicious inputs are silently dropped with no record.
Skip (N/A) when: No AI provider integration detected.
Detail on fail: "Jailbreak detection in lib/moderation.ts returns false but logs nothing — injection attempts are silently dropped with no monitoring" or "All user prompts are logged in full to the database — both a privacy risk and an unnecessary storage cost"

Remediation: Logging suspicious patterns (without logging all content) creates an audit trail that lets you detect coordinated attacks and tune your defenses:

if (hasInjectionPattern(userMessage)) {
  // Log the event, not the content
  await logger.security({
    event: 'injection_pattern_detected',
    sessionId: session.id,        // session identifier, not user name/email
    patternType: detectedPattern,  // which pattern triggered
    messageLength: userMessage.length,
    timestamp: new Date().toISOString()
  })
  return Response.json({ error: 'Message not allowed' }, { status: 400 })
}

For a deeper analysis of logging practices including data retention and privacy, the AI Data Privacy Audit covers AI-specific logging requirements.

External references

cwe · CWE-778 — Insufficient Logging
nist-ai-rmf:1.0 · MEASURE 2.6 — Evaluate AI system trustworthiness and risk posture
nist:rev5 · AU-12 — Audit Record Generation

Taxons

observability

History

2026-04-18·v1.0.0·Initial import from ai-prompt-injection·automated