Usage
This section provides comprehensive documentation on how to use the Trustwise SDK, including basic setup, configuration, and detailed examples for evaluating different metrics.
Note
For understanding the concepts of Trustwise metrics, please refer to the Trustwise Metrics Documentation.
Basic Setup
Here’s how to set up and configure the Trustwise SDK:
import os
from trustwise.sdk import TrustwiseSDK
from trustwise.sdk.config import TrustwiseConfig
# Configure using environment variables
os.environ["TW_API_KEY"] = "your-api-key"
os.environ["TW_BASE_URL"] = "https://api.trustwise.ai"
config = TrustwiseConfig()
# Or configure directly
# config = TrustwiseConfig(
# api_key="your-api-key",
# base_url="https://api.trustwise.ai"
# )
# Initialize the SDK
trustwise = TrustwiseSDK(config)
Evaluating Metrics
Below are example requests and responses for evaluating different metrics using the SDK. Each metric exposes an evaluate() endpoint. We will use faithfulness as an example.
Faithfulness Metric
Copy and run this example to evaluate faithfulness:
# Evaluate faithfulness
faithfulness_result = trustwise.metrics.faithfulness.evaluate(
query="What is the capital of France?",
response="The capital of France is Paris.",
context=[
{"node_id": "1", "node_score": 1.0, "node_text": "Paris is the capital of France."}
]
)
print(faithfulness_result)
Output:
score=99.971924 facts=[Fact(statement='The capital of France is Paris.', label='Safe', prob=0.9997192, sentence_span=[0, 30])]
Stability Metric
Copy and run this example to evaluate stability:
# Evaluate stability
stability_result = trustwise.metrics.stability.evaluate(
responses=[
"The capital of France is Paris.",
"Paris is the capital of France.",
"The capital of France is Paris."
]
)
print(stability_result)
Output:
min=80 avg=87
Note
The StabilityResponse
provides both direct string representation and JSON serialization methods for flexible integration with different workflows.
Refusal Metric
Copy and run this example to evaluate refusal:
# Evaluate refusal
refusal_result = trustwise.metrics.refusal.evaluate(
query="What is the capital of France?",
response="The capital of France is Paris."
)
print(refusal_result)
Output:
score=5
Note
The RefusalResponse
provides both direct string representation and JSON serialization methods for flexible integration with different workflows.
Completion Metric
Copy and run this example to evaluate completion:
# Evaluate completion
completion_result = trustwise.metrics.completion.evaluate(
query="What is the capital of France?",
response="The capital of France is Paris."
)
print(completion_result)
Output:
score=99
Note
The CompletionResponse
provides both direct string representation and JSON serialization methods for flexible integration with different workflows.
Adherence Metric
Copy and run this example to evaluate adherence:
# Evaluate adherence
adherence_result = trustwise.metrics.adherence.evaluate(
policy="Always answer in French.",
response="La capitale de la France est Paris."
)
print(adherence_result)
Output:
score=95
Note
The AdherenceResponse
provides both direct string representation and JSON serialization methods for flexible integration with different workflows.
Async Usage
The SDK provides a fully asynchronous interface for high-throughput or concurrent evaluation. The async API mirrors the synchronous API, but requires await and an event loop.
import asyncio
from trustwise.sdk import TrustwiseSDKAsync
from trustwise.sdk.config import TrustwiseConfig
async def main():
config = TrustwiseConfig()
trustwise = TrustwiseSDKAsync(config)
result = await trustwise.metrics.faithfulness.evaluate(
query="What is the capital of France?",
response="The capital of France is Paris.",
context=[{"node_id": "1", "node_score": 1.0, "node_text": "Paris is the capital of France."}]
)
print(result)
asyncio.run(main())
Note
All metrics are available in both synchronous and asynchronous forms. For async, use TrustwiseSDKAsync
and await
the evaluate()
methods.
Working with JSON Output
# Convert to JSON format for serialization
json_output = faithfulness_result.to_json(indent=2)
print(json_output)
Output:
{
"score": 99.971924,
"facts": [
{
"statement": "The capital of France is Paris.",
"label": "Safe",
"prob": 0.9997192,
"sentence_span": [0, 30]
}
]
}
Working with Python Dict Output
# Convert to Python dict for programmatic access
dict_output = faithfulness_result.to_dict()
print(dict_output)
Output:
{'score': 99.971924, 'facts': [{'statement': 'The capital of France is Paris.', 'label': 'Safe', 'prob': 0.9997192, 'sentence_span': [0, 30]}]}
Working with Result Properties
# Access individual properties
print(f"Faithfulness score: {faithfulness_result.score}")
for fact in faithfulness_result.facts:
print(f"Fact: {fact.statement}, Label: {fact.label}, Probability: {fact.prob}, Span: {fact.sentence_span}")
Output:
Faithfulness score: 99.971924
Fact: The capital of France is Paris., Label: Safe, Probability: 0.9997192, Span: [0, 30]
Note
The FaithfulnessResponse
(and all other response objects) provides both direct string representation and JSON serialization methods for flexible integration with different workflows.
Guardrails (Experimental)
# Create a multi-metric guardrail
guardrail = trustwise.guardrails(
thresholds={
"faithfulness": 0.8,
"answer_relevancy": 0.7,
"clarity": 0.7
},
block_on_failure=True
)
# Evaluate with multiple metrics
evaluation = guardrail.evaluate(
query="What is the capital of France?",
response="The capital of France is Paris.",
context=[
{"node_id": "1", "node_score": 1.0, "node_text": "Paris is the capital of France."}
]
)
print(evaluation)
Output:
passed=True blocked=False results={'faithfulness': {'passed': True, 'result': FaithfulnessResponse(score=99.971924, facts=[Fact(statement='The capital of France is Paris.', label='Safe', prob=0.9997192, sentence_span=[0, 30])])}, 'answer_relevancy': {'passed': True, 'result': AnswerRelevancyResponse(score=96.38003, generated_question='What is the capital of France?')}, 'clarity': {'passed': True, 'result': ClarityResponse(score=73.84502)}}
JSON Response:
json_output = evaluation.to_json(indent=2)
print(json_output)
Output:
{
"passed": true,
"blocked": false,
"results": {
"faithfulness": {
"passed": true,
"result": {
"score": 99.971924,
"facts": [
{
"statement": "The capital of France is Paris.",
"label": "Safe",
"prob": 0.9997192,
"sentence_span": [
0,
30
]
}
]
}
},
"answer_relevancy": {
"passed": true,
"result": {
"score": 96.38003,
"generated_question": "What is the capital of France?"
}
},
"clarity": {
"passed": true,
"result": {
"score": 73.84502
}
}
}
}