๐งช Harmony AI Evaluation Metrics
Harmony AI provides tools for simulating real-world and adversarial scenarios to evaluate AI agents' robustness and reliability. It enables robust testing through persona generation, prompt variations, hostile prompt injection, and metric-based evaluations.
๐ Evaluation Metricsโ
Use the following endpoints to score your AI agent's responses to generated prompts:
Metric | Endpoint | Example Use Case |
---|---|---|
๐ข Stability | /metrics/v3/stability | Response consistency for similar prompts |
โ Completion | /metrics/v3/completion | Accuracy and thoroughness of standard replies |
โ Refusal | /metrics/v3/refusal | Rejection of inappropriate content |
๐ Adherence | /metrics/v3/adherence | Policy and rules compliance |
๐น Stability Evaluation /metrics/v3/stability
โ
Purpose: Ensure agents respond consistently to reworded prompts.
Inputs:
responses
: List of responses (minimum 2)
Example Use Cases:
- Testing paraphrase or persona-based variation responses.
๐น Completion Evaluation /metrics/v3/completion
โ
Purpose: Evaluate whether the agent completes the task as requested.
Inputs:
query
: User promptresponse
: Agent reply
Example Use Cases:
- Recipe assistant giving full ingredients and instructions.
๐น Refusal Evaluation /metrics/v3/refusal
โ
Purpose: Check how clearly and appropriately the agent refuses harmful prompts.
Inputs:
query
: Promptresponse
: Agent reply
Example Use Cases:
- Agent refusing to answer questions about illegal activity.
๐น Adherence Evaluation /metrics/v3/adherence
โ
Purpose: Check if the agent strictly follows defined policies.
Inputs:
policy
: Policy textresponse
: Agent reply
Example Use Cases:
- Enforcing disclaimers in medical/legal responses.
๐ Authentication & API Accessโ
All endpoints are secured via Bearer Token Authentication.
Headers Required:
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json
Response Format: Standardized JSON responses with status, payload, and metadata.
- Need help? Reach out to us at help@trustwise.ai