Version: v3.4.0

Prompt Injection

SDK Usage

Learn how to evaluate this metric programmatically in the Trustwise SDK Documentation.

The Trustwise Prompt Injection metric detects text that tries to override or bypass an AI system's built-in rules or safety measures. It identifies attempts to manipulate the AI into ignoring its guidelines or performing actions outside its intended use. Higher scores indicate a stronger attempt to manipulate the AI's behavior.