Metrics

Note

All metrics are available in both synchronous and asynchronous forms. For async, use TrustwiseSDKAsync and await the evaluate() methods.

The SDK provides access to all metrics through the unified metrics namespace. Each metric provides an evaluate() function. Example usage:

For more details on the metrics, please refer to the Trustwise Metrics Documentation.

result = trustwise.metrics.faithfulness.evaluate(query="...", response="...", context=[...])
clarity = trustwise.metrics.clarity.evaluate(query="...", response="...")
cost = trustwise.metrics.cost.evaluate(model_name="...", model_type="LLM", ...)

Async example:

import asyncio
from trustwise.sdk import TrustwiseSDKAsync
from trustwise.sdk.config import TrustwiseConfig

async def main():
    config = TrustwiseConfig()
    trustwise = TrustwiseSDKAsync(config)
    await trustwise.metrics.faithfulness.evaluate(
        query="What is the capital of France?",
        response="The capital of France is Paris.",
        context=[{"node_id": "1", "node_score": 1.0, "node_text": "Paris is the capital of France."}]

asyncio.run(main())

Refer to the API Reference for details on each metric’s parameters.

Note

Custom types such as Context are defined in trustwise.sdk.types.

Adherence

metrics.adherence.evaluate(policy: str, response: str) → AdherenceResponse

Evaluate how well the response adheres to a given policy or instruction.

Request:

AdherenceRequest

Returns:

AdherenceResponse

Example response:

{
    "score": 95
}

score: An integer, 0-100, measuring how well the response follows the policy (100 is perfect adherence)

Answer Relevancy

metrics.answer_relevancy.evaluate(query: str, response: str) → AnswerRelevancyResponse

Evaluate the relevancy of a response to the query.

Request:

AnswerRelevancyRequest

Returns:

AnswerRelevancyResponse

Example response:

{
    "score": 92.0,
    "generated_question": "What is the capital city of France?"
}

Carbon

metrics.carbon.evaluate(processor_name: str, provider_name: str, provider_region: str, instance_type: str, average_latency: int) → CarbonResponse

Evaluates the carbon emissions based on hardware specifications and infrastructure details.

Request:

CarbonRequest

Returns:

CarbonResponse

Example response:

{
    "carbon_emitted": 0.00015,
    "sci_per_api_call": 0.00003,
    "sci_per_10k_calls": 0.3
}

Clarity

metrics.clarity.evaluate(response: str) → ClarityResponse

Evaluate the clarity of a response.

Request:

ClarityRequest

Returns:

ClarityResponse

Example response:

{
    "score": 92.5
}

Completion

metrics.completion.evaluate(query: str, response: str) → CompletionResponse

Evaluate how well the response completes or follows the query’s instruction.

Request:

CompletionRequest

Returns:

CompletionResponse

Example response:

{
    "score": 99
}

score: An integer, 0-100, measuring how well the response completes the query (100 is a perfect completion)

Context Relevancy

metrics.context_relevancy.evaluate(query: str, context: list[ContextNode]) → ContextRelevancyResponse

Evaluate the relevancy of the context to the query.

Request:

ContextRelevancyRequest

Returns:

ContextRelevancyResponse

Example response:

{
    "score": 88.5,
    "topics": ["geography", "capitals", "France"],
    "scores": [0.92, 0.85, 0.88]
}

Cost

metrics.cost.evaluate(model_name: str, model_type: str, model_provider: str, number_of_queries: int, total_prompt_tokens: int | None = None, total_completion_tokens: int | None = None, total_tokens: int | None = None, instance_type: str | None = None, average_latency: float | None = None) → CostResponse

Evaluates the cost of API usage based on token counts, model information, and infrastructure details.

Request:

CostRequest

Returns:

CostResponse

Example response:

{
    "cost_estimate_per_run": 0.0025,
    "total_project_cost_estimate": 0.0125
}

Faithfulness

metrics.faithfulness.evaluate(query: str, response: str, context: list[ContextNode]) → FaithfulnessResponse

Evaluate the faithfulness of a response against its context.

Request:

FaithfulnessRequest

Returns:

FaithfulnessResponse

Example response:

{
    "score": 99.971924,
    "facts": [
            {
            "statement": "The capital of France is Paris.",
            "label": "Safe",
            "prob": 0.9997192,
            "sentence_span": [
                0,
                30
            ]
        }
    ]
}

Formality

metrics.formality.evaluate(response: str) → FormalityResponse

Evaluate the formality level of a response.

Request:

FormalityRequest

Returns:

FormalityResponse

Example response:

{
    "score": 75.0,
    "sentences": [
        "The capital of France is Paris."
    ],
    "scores": [0.75]
}

Helpfulness

metrics.helpfulness.evaluate(response: str) → HelpfulnessResponse

Evaluate the helpfulness of a response.

Request:

HelpfulnessRequest

Returns:

HelpfulnessResponse

Example response:

{
    "score": 88.0
}

PII

metrics.pii.evaluate(text: str, blocklist: list[str] | None = None, allowlist: list[str] | None = None) → PIIResponse

Detect personally identifiable information in text.

Request:

PIIRequest

Returns:

PIIResponse

Example response:

{
    "identified_pii": [
        {
            "interval": [0, 5],
            "string": "Hello",
            "category": "blocklist"
        },
        {
            "interval": [94, 111],
            "string": "www.wikipedia.org",
            "category": "organization"
        }
    ]
}

Prompt Injection

metrics.prompt_injection.evaluate(query: str) → PromptInjectionResponse

Detect potential prompt injection attempts.

Request:

PromptInjectionRequest

Returns:

PromptInjectionResponse

Example response:

{
    "score": 98.0
}

Refusal

metrics.refusal.evaluate(query: str, response: str) → RefusalResponse

Evaluate the likelihood that a response is a refusal to answer or comply with the query.

Request:

RefusalRequest

Returns:

RefusalResponse

Example response:

{
    "score": 5
}

score: An integer, 0-100, measuring the degree (firmness) of refusal (100 is a strong refusal)

Sensitivity

metrics.sensitivity.evaluate(response: str, topics: list[str]) → SensitivityResponse

Evaluate the sensitivity of a response regarding specific topics.

Request:

SensitivityRequest

Returns:

SensitivityResponse

Example response:

{
    "scores": {
        "politics": 0.70,
        "religion": 0.60
    }
}

Simplicity

metrics.simplicity.evaluate(response: str) → SimplicityResponse

Evaluate the simplicity of a response.

Request:

SimplicityRequest

Returns:

SimplicityResponse

Example response:

{
    "score": 82.0
}

Stability

metrics.stability.evaluate(responses: list[str]) → StabilityResponse

Evaluate the stability (consistency) of multiple responses to the same prompt.

Request:

StabilityRequest

Returns:

StabilityResponse

Example response:

{
    "min": 80,
    "avg": 87
}

min: An integer, 0-100, measuring the minimum stability between any two pairs of responses (100 is high similarity)
avg: An integer, 0-100, measuring the average stability between all two pairs of responses (100 is high similarity)

Summarization

metrics.summarization.evaluate(response: str, context: list[ContextNode]) → SummarizationResponse

Evaluate the quality of a summary.

Request:

SummarizationRequest

Returns:

SummarizationResponse

Example response:

{
    "score": 90.0
}

Tone

metrics.tone.evaluate(response: str) → ToneResponse

Evaluate the tone of a response.

Request:

ToneRequest

Returns:

ToneResponse

Example response:

{
    "labels": [
        "neutral",
        "happiness",
        "realization"
    ],
    "scores": [
        89.704185,
        6.6798472,
        2.9873204
    ]
}

Toxicity

metrics.toxicity.evaluate(response: str) → ToxicityResponse

Evaluate the toxicity of a response.

Request:

ToxicityRequest

Returns:

ToxicityResponse

Example response:

{
    "labels": [
        "identity_hate",
        "insult",
        "threat",
        "obscene",
        "toxic"
    ],
    "scores": [
        0.036089644,
        0.06207772,
        0.027964465,
        0.105483316,
        0.3622106
    ]
}