.. _v4_metrics: V4 Metrics ========== .. note:: **Version 4 is now the default** for Trustwise metrics. It provides enhanced accuracy, better type safety, and improved response structures compared to v3. All metrics are available in both synchronous and asynchronous forms. For async, use ``TrustwiseSDKAsync`` and ``await`` the ``evaluate()`` methods. The SDK provides access to all v4 metrics directly through the ``metrics`` namespace. Each metric provides an ``evaluate()`` function with enhanced type safety and improved response structures. Example usage: For more details on the metrics, please refer to the `Trustwise Metrics Documentation `_. .. code-block:: python from trustwise.sdk import TrustwiseSDK from trustwise.sdk.config import TrustwiseConfig from trustwise.sdk.metrics.v4.types import ContextChunk config = TrustwiseConfig(api_key="your-api-key") trustwise = TrustwiseSDK(config) # v4 context format context = [ ContextChunk(chunk_text="Paris is the capital of France.", chunk_id="doc:idx:1") ] # v4 metric calls (now default) result = trustwise.metrics.faithfulness.evaluate( query="What is the capital of France?", response="The capital of France is Paris.", context=context ) Async example: .. code-block:: python import asyncio from trustwise.sdk import TrustwiseSDKAsync from trustwise.sdk.config import TrustwiseConfig from trustwise.sdk.metrics.v4.types import ContextChunk async def main(): config = TrustwiseConfig(api_key="your-api-key") trustwise = TrustwiseSDKAsync(config) context = [ ContextChunk(chunk_text="Paris is the capital of France.", chunk_id="doc:idx:1") ] result = await trustwise.metrics.faithfulness.evaluate( query="What is the capital of France?", response="The capital of France is Paris.", context=context ) asyncio.run(main()) Refer to the :doc:`api` for details on each metric's parameters. .. note:: Custom types such as ``ContextChunk`` are defined in :mod:`trustwise.sdk.metrics.v4.types`. Adherence ~~~~~~~~~ .. function:: metrics.adherence.evaluate(policy: str, response: str) -> AdherenceResponse Evaluate how well the response follows a given policy or instruction. Request: - :class:`~trustwise.sdk.metrics.v4.types.AdherenceRequest` Returns: - :class:`~trustwise.sdk.metrics.v4.types.AdherenceResponse` Example response: .. code-block:: json { "score": 85.0 } - ``score``: A float, 0-100, measuring how well the response follows the policy (higher is better adherence) Answer Relevancy ~~~~~~~~~~~~~~~~ .. function:: metrics.answer_relevancy.evaluate(query: str, response: str) -> AnswerRelevancyResponse Evaluate the relevancy of a response to the query. Request: - :class:`~trustwise.sdk.metrics.v4.types.AnswerRelevancyRequest` Returns: - :class:`~trustwise.sdk.metrics.v4.types.AnswerRelevancyResponse` Example response: .. code-block:: json { "score": 92.0, "generated_question": "What is the capital city of France?" } - ``score``: A float, 0-100, measuring how relevant the response is to the query. Higher score indicates better relevancy. - ``generated_question``: The generated question for which the response would be relevant Clarity ~~~~~~~ .. function:: metrics.clarity.evaluate(text: str) -> ClarityResponse The Trustwise Clarity metric measures how easy text is to read. It gives higher scores to writing that contains words which are easier to read, and uses concise, self-contained sentences. It does not measure how well you understand the ideas in the text. Request: - :class:`~trustwise.sdk.metrics.v4.types.ClarityRequest` Returns: - :class:`~trustwise.sdk.metrics.v4.types.ClarityResponse` Example response: .. code-block:: json { "score": 92.5 } - ``score``: A float, 0-100, measuring how clear and understandable the response is. Higher score indicates better clarity. Completion ~~~~~~~~~~ .. function:: metrics.completion.evaluate(query: str, response: str) -> CompletionResponse Evaluate how well the response completes or follows the query's instruction. Request: - :class:`~trustwise.sdk.metrics.v4.types.CompletionRequest` Returns: - :class:`~trustwise.sdk.metrics.v4.types.CompletionResponse` Example response: .. code-block:: json { "score": 85.0 } - ``score``: A float, 0-100, measuring how well the response completes the query. Higher score indicates better completion. Context Relevancy ~~~~~~~~~~~~~~~~~ .. function:: metrics.context_relevancy.evaluate(query: str, context: Context, severity: float = None, include_chunk_scores: bool = None, metadata: dict = None) -> ContextRelevancyResponse Evaluate the relevancy of the context to the query. Request: - :class:`~trustwise.sdk.metrics.v4.types.ContextRelevancyRequest` Returns: - :class:`~trustwise.sdk.metrics.v4.types.ContextRelevancyResponse` Example response: .. code-block:: json { "score": 88.5, "scores": [ {"label": "Circumstances", "score": 0.33}, {"label": "Claim", "score": 46.34}, {"label": "Policy", "score": 0.11} ] } - ``score``: A float, 0-100, measuring how relevant the context is to the query. Higher score indicates better relevancy. - ``scores``: List of :class:`~trustwise.sdk.metrics.v4.types.ObjectStyleScore` with detailed breakdown by relevancy aspect Faithfulness ~~~~~~~~~~~~ .. function:: metrics.faithfulness.evaluate(query: str, response: str, context: Context) -> FaithfulnessResponse Evaluate the faithfulness of a response against its context. Request: - :class:`~trustwise.sdk.metrics.v4.types.FaithfulnessRequest` Returns: - :class:`~trustwise.sdk.metrics.v4.types.FaithfulnessResponse` Example response: .. code-block:: json { "score": 99.971924, "statements": [ { "statement": "The capital of France is Paris.", "label": "Safe", "probability": 0.9997192, "sentence_span": [0, 30] } ] } - ``score``: A float, 0-100, measuring how faithful the response is to the context (100 is perfect faithfulness) - ``statements``: List of extracted atomic statements with their verification status Formality ~~~~~~~~~ .. function:: metrics.formality.evaluate(text: str) -> FormalityResponse Evaluate the formality level of a response. Request: - :class:`~trustwise.sdk.metrics.v4.types.FormalityRequest` Returns: - :class:`~trustwise.sdk.metrics.v4.types.FormalityResponse` Example response: .. code-block:: json { "score": 75.0, } - ``score``: A float, 0-100, measuring the overall formality level (100 is very formal) Helpfulness ~~~~~~~~~~~ .. function:: metrics.helpfulness.evaluate(text: str) -> HelpfulnessResponse Evaluate the helpfulness of a response. Request: - :class:`~trustwise.sdk.metrics.v4.types.HelpfulnessRequest` Returns: - :class:`~trustwise.sdk.metrics.v4.types.HelpfulnessResponse` Example response: .. code-block:: json { "score": 88.0 } - ``score``: A float, 0-100, measuring how helpful the response is (100 is very helpful) PII Detection ~~~~~~~~~~~~~ .. function:: metrics.pii.evaluate(text: str, allowlist: list[str] = None, blocklist: list[str] = None, categories: list[str] = None) -> PIIResponse Detect personally identifiable information in text. Request: - :class:`~trustwise.sdk.metrics.v4.types.PIIRequest` Returns: - :class:`~trustwise.sdk.metrics.v4.types.PIIResponse` Example response: .. code-block:: json { "pii": [ { "interval": [0, 5], "string": "Hello", "category": "blocklist" } ] } - ``pii``: List of detected PII entities with their locations and categories Prompt Manipulation ~~~~~~~~~~~~~~~~~~~ .. function:: metrics.prompt_manipulation.evaluate(text: str, severity: int = None) -> PromptManipulationResponse Detect potential prompt manipulation attempts including jailbreak, prompt injection, and role play. Request: - :class:`~trustwise.sdk.metrics.v4.types.PromptManipulationRequest` Returns: - :class:`~trustwise.sdk.metrics.v4.types.PromptManipulationResponse` Example response: .. code-block:: json { "score": 0.85, "scores": [ {"label": "jailbreak", "score": 0.90}, {"label": "prompt_injection", "score": 0.80}, {"label": "role_play", "score": 0.85} ] } - ``score``: A float, 0-100, measuring overall prompt manipulation likelihood (higher is more likely) - ``scores``: List of :class:`~trustwise.sdk.metrics.v4.types.ObjectStyleScore` with detailed breakdown by manipulation type Refusal ~~~~~~~ .. function:: metrics.refusal.evaluate(query: str, response: str) -> RefusalResponse Evaluate the likelihood that a response is a refusal to answer or comply with the query. Request: - :class:`~trustwise.sdk.metrics.v4.types.RefusalRequest` Returns: - :class:`~trustwise.sdk.metrics.v4.types.RefusalResponse` Example response: .. code-block:: json { "score": 5.0 } - ``score``: A float, 0-100, measuring the degree of refusal (higher indicates stronger refusal) Sensitivity ~~~~~~~~~~~ .. function:: metrics.sensitivity.evaluate(text: str, topics: list[str]) -> SensitivityResponse Evaluate the sensitivity of a response regarding specific topics. Request: - :class:`~trustwise.sdk.metrics.v4.types.SensitivityRequest` Returns: - :class:`~trustwise.sdk.metrics.v4.types.SensitivityResponse` Example response: .. code-block:: json { "scores": [ {"label": "politics", "score": 70.0}, {"label": "religion", "score": 60.0} ] } - ``scores``: List of :class:`~trustwise.sdk.metrics.v4.types.ObjectStyleScore` with sensitivity scores by topic (0-100, higher indicates stronger presence) Stability ~~~~~~~~~ .. function:: metrics.stability.evaluate(responses: list[str]) -> StabilityResponse Measures how similar the responses are when given the same or similar inputs multiple times. It gives higher scores when responses stay consistent, even if asked by different personas or worded differently. This helps identify if an agent changes its answers unexpectedly. Request: - :class:`~trustwise.sdk.metrics.v4.types.StabilityRequest` Returns: - :class:`~trustwise.sdk.metrics.v4.types.StabilityResponse` Example response: .. code-block:: json { "min": 75, "avg": 85 } - ``min``: An integer, 0-100, measuring the minimum stability between any pair of responses (100 is high similarity) - ``avg``: An integer, 0-100, measuring the average stability between all pairs of responses (100 is high similarity) Simplicity ~~~~~~~~~~ .. function:: metrics.simplicity.evaluate(text: str) -> SimplicityResponse Measures how easy it is to understand the words in a text. It gives higher scores to writing that uses common, everyday words instead of special terms or complicated words. Simplicity looks at the words you choose, not how you put them together in sentences. Request: - :class:`~trustwise.sdk.metrics.v4.types.SimplicityRequest` Returns: - :class:`~trustwise.sdk.metrics.v4.types.SimplicityResponse` Example response: .. code-block:: json { "score": 82.0 } - ``score``: A float, 0-100, measuring how simple the response is. Higher score indicates simpler text. Tone ~~~~ .. function:: metrics.tone.evaluate(text: str, tones: list[str] = None) -> ToneResponse Evaluate the tone of a response. Request: - :class:`~trustwise.sdk.metrics.v4.types.ToneRequest` Returns: - :class:`~trustwise.sdk.metrics.v4.types.ToneResponse` Example response: .. code-block:: json { "scores": [ {"label": "neutral", "score": 89.70}, {"label": "happiness", "score": 6.68}, {"label": "realization", "score": 2.99} ] } - ``scores``: List of :class:`~trustwise.sdk.metrics.v4.types.ObjectStyleScore` with tone confidence scores (0-100, higher indicates stronger presence) Toxicity ~~~~~~~~ .. function:: metrics.toxicity.evaluate(text: str, severity: int = None) -> ToxicityResponse Evaluate the toxicity of a response. Request: - :class:`~trustwise.sdk.metrics.v4.types.ToxicityRequest` Returns: - :class:`~trustwise.sdk.metrics.v4.types.ToxicityResponse` Example response: .. code-block:: json { "score": 36.22, "scores": [ {"label": "identity_hate", "score": 3.61}, {"label": "insult", "score": 6.21}, {"label": "threat", "score": 2.80}, {"label": "obscene", "score": 10.55}, {"label": "toxic", "score": 36.22} ] } - ``score``: A float, 0-100, measuring overall toxicity (higher is more toxic) - ``scores``: List of :class:`~trustwise.sdk.metrics.v4.types.ObjectStyleScore` with detailed breakdown by toxicity category Carbon ~~~~~~ .. function:: metrics.carbon.evaluate(provider: str, region: str, instance_type: str, latency: float | int) -> CarbonResponse Evaluate the carbon footprint of AI operations based on provider, instance type, region, and latency. Request: - :class:`~trustwise.sdk.metrics.v4.types.CarbonRequest` Returns: - :class:`~trustwise.sdk.metrics.v4.types.CarbonResponse` Example response: .. code-block:: json { "carbon": { "value": 0.0011949989480068127, "unit": "kg_co2e" }, "components": [ { "component": "operational_gpu", "carbon": { "value": 0.0, "unit": "kg_co2e" } }, { "component": "operational_cpu", "carbon": { "value": 0.00021294669026962343, "unit": "kg_co2e" } }, { "component": "embodied_cpu", "carbon": { "value": 0.0009820522577371892, "unit": "kg_co2e" } } ] } - ``carbon``: :class:`~trustwise.sdk.metrics.v4.types.CarbonValue` with total carbon footprint - ``components``: List of :class:`~trustwise.sdk.metrics.v4.types.CarbonComponent` with breakdown by component