Usage

This section provides comprehensive documentation on how to use the Trustwise SDK, including basic setup, configuration, and detailed examples for evaluating different metrics.

Note

For understanding the concepts of Trustwise metrics, please refer to the Trustwise Metrics Documentation.

Basic Setup

Here’s how to set up and configure the Trustwise SDK:

import os
from trustwise.sdk import TrustwiseSDK
from trustwise.sdk.config import TrustwiseConfig

# Configure using environment variables
os.environ["TW_API_KEY"] = "your-api-key"
os.environ["TW_BASE_URL"] = "https://api.trustwise.ai"
config = TrustwiseConfig()

# Or configure directly
# config = TrustwiseConfig(
#     api_key="your-api-key",
#     base_url="https://api.trustwise.ai"
# )

# Initialize the SDK
trustwise = TrustwiseSDK(config)

Evaluating Metrics

Below are example requests and responses for evaluating different metrics using the SDK. Each metric exposes an evaluate() endpoint. We will use faithfulness as an example.

Faithfulness Metric

Copy and run this example to evaluate faithfulness:

# Evaluate faithfulness
faithfulness_result = trustwise.metrics.faithfulness.evaluate(
    query="What is the capital of France?",
    response="The capital of France is Paris.",
    context=[
        {"node_id": "1", "node_score": 1.0, "node_text": "Paris is the capital of France."}
    ]
)

print(faithfulness_result)

Output:

score=99.971924 facts=[Fact(statement='The capital of France is Paris.', label='Safe', prob=0.9997192, sentence_span=[0, 30])]

Stability Metric

Copy and run this example to evaluate stability:

# Evaluate stability
stability_result = trustwise.metrics.stability.evaluate(
    responses=[
        "The capital of France is Paris.",
        "Paris is the capital of France.",
        "The capital of France is Paris."
    ]
)

print(stability_result)

Output:

min=80 avg=87

Note

The StabilityResponse provides both direct string representation and JSON serialization methods for flexible integration with different workflows.

Refusal Metric

Copy and run this example to evaluate refusal:

# Evaluate refusal
refusal_result = trustwise.metrics.refusal.evaluate(
    query="What is the capital of France?",
    response="The capital of France is Paris."
)

print(refusal_result)

Output:

score=5

Note

The RefusalResponse provides both direct string representation and JSON serialization methods for flexible integration with different workflows.

Completion Metric

Copy and run this example to evaluate completion:

# Evaluate completion
completion_result = trustwise.metrics.completion.evaluate(
    query="What is the capital of France?",
    response="The capital of France is Paris."
)

print(completion_result)

Output:

score=99

Note

The CompletionResponse provides both direct string representation and JSON serialization methods for flexible integration with different workflows.

Adherence Metric

Copy and run this example to evaluate adherence:

# Evaluate adherence
adherence_result = trustwise.metrics.adherence.evaluate(
    policy="Always answer in French.",
    response="La capitale de la France est Paris."
)

print(adherence_result)

Output:

score=95

Note

The AdherenceResponse provides both direct string representation and JSON serialization methods for flexible integration with different workflows.

Async Usage

The SDK provides a fully asynchronous interface for high-throughput or concurrent evaluation. The async API mirrors the synchronous API, but requires await and an event loop.

import asyncio
from trustwise.sdk import TrustwiseSDKAsync
from trustwise.sdk.config import TrustwiseConfig

async def main():
    config = TrustwiseConfig()
    trustwise = TrustwiseSDKAsync(config)
    result = await trustwise.metrics.faithfulness.evaluate(
        query="What is the capital of France?",
        response="The capital of France is Paris.",
        context=[{"node_id": "1", "node_score": 1.0, "node_text": "Paris is the capital of France."}]
    )
    print(result)

asyncio.run(main())

Note

All metrics are available in both synchronous and asynchronous forms. For async, use TrustwiseSDKAsync and await the evaluate() methods.

Working with JSON Output

# Convert to JSON format for serialization
json_output = faithfulness_result.to_json(indent=2)
print(json_output)

Output:

{
  "score": 99.971924,
  "facts": [
    {
      "statement": "The capital of France is Paris.",
      "label": "Safe",
      "prob": 0.9997192,
      "sentence_span": [0, 30]
    }
  ]
}

Working with Python Dict Output

# Convert to Python dict for programmatic access
dict_output = faithfulness_result.to_dict()
print(dict_output)

Output:

{'score': 99.971924, 'facts': [{'statement': 'The capital of France is Paris.', 'label': 'Safe', 'prob': 0.9997192, 'sentence_span': [0, 30]}]}

Working with Result Properties

# Access individual properties
print(f"Faithfulness score: {faithfulness_result.score}")
for fact in faithfulness_result.facts:
    print(f"Fact: {fact.statement}, Label: {fact.label}, Probability: {fact.prob}, Span: {fact.sentence_span}")

Output:

Faithfulness score: 99.971924
Fact: The capital of France is Paris., Label: Safe, Probability: 0.9997192, Span: [0, 30]

Note

The FaithfulnessResponse (and all other response objects) provides both direct string representation and JSON serialization methods for flexible integration with different workflows.

Guardrails (Experimental)

# Create a multi-metric guardrail
guardrail = trustwise.guardrails(
    thresholds={
        "faithfulness": 0.8,
        "answer_relevancy": 0.7,
        "clarity": 0.7
    },
    block_on_failure=True
)

# Evaluate with multiple metrics
evaluation = guardrail.evaluate(
    query="What is the capital of France?",
    response="The capital of France is Paris.",
    context=[
        {"node_id": "1", "node_score": 1.0, "node_text": "Paris is the capital of France."}
    ]
)
print(evaluation)

Output:

passed=True blocked=False results={'faithfulness': {'passed': True, 'result': FaithfulnessResponse(score=99.971924, facts=[Fact(statement='The capital of France is Paris.', label='Safe', prob=0.9997192, sentence_span=[0, 30])])}, 'answer_relevancy': {'passed': True, 'result': AnswerRelevancyResponse(score=96.38003, generated_question='What is the capital of France?')}, 'clarity': {'passed': True, 'result': ClarityResponse(score=73.84502)}}

JSON Response:

json_output = evaluation.to_json(indent=2)
print(json_output)

Output:

{
    "passed": true,
    "blocked": false,
    "results": {
        "faithfulness": {
            "passed": true,
            "result": {
                "score": 99.971924,
                "facts": [
                    {
                        "statement": "The capital of France is Paris.",
                        "label": "Safe",
                        "prob": 0.9997192,
                        "sentence_span": [
                        0,
                        30
                        ]
                    }
                ]
            }
        },
        "answer_relevancy": {
            "passed": true,
            "result": {
                "score": 96.38003,
                "generated_question": "What is the capital of France?"
            }
        },
        "clarity": {
            "passed": true,
            "result": {
                "score": 73.84502
            }
        }
    }
}