Safety assessment against prompt injection attacks, identifying vulnerabilities where untrusted user input might cause the AI to ignore instructions or exfiltrate data.