.. _metrics: Metrics ======= .. note:: All metrics are available in both synchronous and asynchronous forms. For async, use ``TrustwiseSDKAsync`` and ``await`` the ``evaluate()`` methods. The SDK provides access to all metrics through the unified ``metrics`` namespace. Each metric provides an ``evaluate()`` function. Example usage: For more details on the metrics, please refer to the `Trustwise Metrics Documentation `_. .. code-block:: python result = trustwise.metrics.faithfulness.evaluate(query="...", response="...", context=[...]) clarity = trustwise.metrics.clarity.evaluate(query="...", response="...") cost = trustwise.metrics.cost.evaluate(model_name="...", model_type="LLM", ...) Async example: .. code-block:: python import asyncio from trustwise.sdk import TrustwiseSDKAsync from trustwise.sdk.config import TrustwiseConfig async def main(): config = TrustwiseConfig() trustwise = TrustwiseSDKAsync(config) await trustwise.metrics.faithfulness.evaluate( query="What is the capital of France?", response="The capital of France is Paris.", context=[{"node_id": "1", "node_score": 1.0, "node_text": "Paris is the capital of France."}] asyncio.run(main()) Refer to the :doc:`api` for details on each metric's parameters. .. note:: Custom types such as ``Context`` are defined in :mod:`trustwise.sdk.types`. Adherence ~~~~~~~~~ .. function:: metrics.adherence.evaluate(policy: str, response: str) -> AdherenceResponse Evaluate how well the response adheres to a given policy or instruction. Request: - :class:`~trustwise.sdk.types.AdherenceRequest` Returns: - :class:`~trustwise.sdk.types.AdherenceResponse` Example response: .. code-block:: json { "score": 95 } - ``score``: An integer, 0-100, measuring how well the response follows the policy (100 is perfect adherence) Answer Relevancy ~~~~~~~~~~~~~~~~ .. function:: metrics.answer_relevancy.evaluate(query: str, response: str) -> AnswerRelevancyResponse Evaluate the relevancy of a response to the query. Request: - :class:`~trustwise.sdk.types.AnswerRelevancyRequest` Returns: - :class:`~trustwise.sdk.types.AnswerRelevancyResponse` Example response: .. code-block:: json { "score": 92.0, "generated_question": "What is the capital city of France?" } Carbon ~~~~~~ .. function:: metrics.carbon.evaluate(processor_name: str, provider_name: str, provider_region: str, instance_type: str, average_latency: int) -> CarbonResponse Evaluates the carbon emissions based on hardware specifications and infrastructure details. Request: - :class:`~trustwise.sdk.types.CarbonRequest` Returns: - :class:`~trustwise.sdk.types.CarbonResponse` Example response: .. code-block:: json { "carbon_emitted": 0.00015, "sci_per_api_call": 0.00003, "sci_per_10k_calls": 0.3 } Clarity ~~~~~~~ .. function:: metrics.clarity.evaluate(response: str) -> ClarityResponse Evaluate the clarity of a response. Request: - :class:`~trustwise.sdk.types.ClarityRequest` Returns: - :class:`~trustwise.sdk.types.ClarityResponse` Example response: .. code-block:: json { "score": 92.5 } Completion ~~~~~~~~~~ .. function:: metrics.completion.evaluate(query: str, response: str) -> CompletionResponse Evaluate how well the response completes or follows the query's instruction. Request: - :class:`~trustwise.sdk.types.CompletionRequest` Returns: - :class:`~trustwise.sdk.types.CompletionResponse` Example response: .. code-block:: json { "score": 99 } - ``score``: An integer, 0-100, measuring how well the response completes the query (100 is a perfect completion) Context Relevancy ~~~~~~~~~~~~~~~~~ .. function:: metrics.context_relevancy.evaluate(query: str, context: list[ContextNode]) -> ContextRelevancyResponse Evaluate the relevancy of the context to the query. Request: - :class:`~trustwise.sdk.types.ContextRelevancyRequest` Returns: - :class:`~trustwise.sdk.types.ContextRelevancyResponse` Example response: .. code-block:: json { "score": 88.5, "topics": ["geography", "capitals", "France"], "scores": [0.92, 0.85, 0.88] } Cost ~~~~ .. function:: metrics.cost.evaluate(model_name: str, model_type: str, model_provider: str, number_of_queries: int, total_prompt_tokens: Optional[int] = None, total_completion_tokens: Optional[int] = None, total_tokens: Optional[int] = None, instance_type: Optional[str] = None, average_latency: Optional[float] = None) -> CostResponse Evaluates the cost of API usage based on token counts, model information, and infrastructure details. Request: - :class:`~trustwise.sdk.types.CostRequest` Returns: - :class:`~trustwise.sdk.types.CostResponse` Example response: .. code-block:: json { "cost_estimate_per_run": 0.0025, "total_project_cost_estimate": 0.0125 } Faithfulness ~~~~~~~~~~~~ .. function:: metrics.faithfulness.evaluate(query: str, response: str, context: list[ContextNode]) -> FaithfulnessResponse Evaluate the faithfulness of a response against its context. Request: - :class:`~trustwise.sdk.types.FaithfulnessRequest` Returns: - :class:`~trustwise.sdk.types.FaithfulnessResponse` Example response: .. code-block:: json { "score": 99.971924, "facts": [ { "statement": "The capital of France is Paris.", "label": "Safe", "prob": 0.9997192, "sentence_span": [ 0, 30 ] } ] } Formality ~~~~~~~~~ .. function:: metrics.formality.evaluate(response: str) -> FormalityResponse Evaluate the formality level of a response. Request: - :class:`~trustwise.sdk.types.FormalityRequest` Returns: - :class:`~trustwise.sdk.types.FormalityResponse` Example response: .. code-block:: json { "score": 75.0, "sentences": [ "The capital of France is Paris." ], "scores": [0.75] } Helpfulness ~~~~~~~~~~~ .. function:: metrics.helpfulness.evaluate(response: str) -> HelpfulnessResponse Evaluate the helpfulness of a response. Request: - :class:`~trustwise.sdk.types.HelpfulnessRequest` Returns: - :class:`~trustwise.sdk.types.HelpfulnessResponse` Example response: .. code-block:: json { "score": 88.0 } PII ~~~ .. function:: metrics.pii.evaluate(text: str, blocklist: list[str] | None = None, allowlist: list[str] | None = None) -> PIIResponse Detect personally identifiable information in text. Request: - :class:`~trustwise.sdk.types.PIIRequest` Returns: - :class:`~trustwise.sdk.types.PIIResponse` Example response: .. code-block:: json { "identified_pii": [ { "interval": [0, 5], "string": "Hello", "category": "blocklist" }, { "interval": [94, 111], "string": "www.wikipedia.org", "category": "organization" } ] } Prompt Injection ~~~~~~~~~~~~~~~~ .. function:: metrics.prompt_injection.evaluate(query: str) -> PromptInjectionResponse Detect potential prompt injection attempts. Request: - :class:`~trustwise.sdk.types.PromptInjectionRequest` Returns: - :class:`~trustwise.sdk.types.PromptInjectionResponse` Example response: .. code-block:: json { "score": 98.0 } Refusal ~~~~~~~ .. function:: metrics.refusal.evaluate(query: str, response: str) -> RefusalResponse Evaluate the likelihood that a response is a refusal to answer or comply with the query. Request: - :class:`~trustwise.sdk.types.RefusalRequest` Returns: - :class:`~trustwise.sdk.types.RefusalResponse` Example response: .. code-block:: json { "score": 5 } - ``score``: An integer, 0-100, measuring the degree (firmness) of refusal (100 is a strong refusal) Sensitivity ~~~~~~~~~~~ .. function:: metrics.sensitivity.evaluate(response: str, topics: list[str]) -> SensitivityResponse Evaluate the sensitivity of a response regarding specific topics. Request: - :class:`~trustwise.sdk.types.SensitivityRequest` Returns: - :class:`~trustwise.sdk.types.SensitivityResponse` Example response: .. code-block:: json { "scores": { "politics": 0.70, "religion": 0.60 } } Simplicity ~~~~~~~~~~ .. function:: metrics.simplicity.evaluate(response: str) -> SimplicityResponse Evaluate the simplicity of a response. Request: - :class:`~trustwise.sdk.types.SimplicityRequest` Returns: - :class:`~trustwise.sdk.types.SimplicityResponse` Example response: .. code-block:: json { "score": 82.0 } Stability ~~~~~~~~~ .. function:: metrics.stability.evaluate(responses: list[str]) -> StabilityResponse Evaluate the stability (consistency) of multiple responses to the same prompt. Request: - :class:`~trustwise.sdk.types.StabilityRequest` Returns: - :class:`~trustwise.sdk.types.StabilityResponse` Example response: .. code-block:: json { "min": 80, "avg": 87 } - ``min``: An integer, 0-100, measuring the minimum stability between any two pairs of responses (100 is high similarity) - ``avg``: An integer, 0-100, measuring the average stability between all two pairs of responses (100 is high similarity) Summarization ~~~~~~~~~~~~~ .. function:: metrics.summarization.evaluate(response: str, context: list[ContextNode]) -> SummarizationResponse Evaluate the quality of a summary. Request: - :class:`~trustwise.sdk.types.SummarizationRequest` Returns: - :class:`~trustwise.sdk.types.SummarizationResponse` Example response: .. code-block:: json { "score": 90.0 } Tone ~~~~ .. function:: metrics.tone.evaluate(response: str) -> ToneResponse Evaluate the tone of a response. Request: - :class:`~trustwise.sdk.types.ToneRequest` Returns: - :class:`~trustwise.sdk.types.ToneResponse` Example response: .. code-block:: json { "labels": [ "neutral", "happiness", "realization" ], "scores": [ 89.704185, 6.6798472, 2.9873204 ] } Toxicity ~~~~~~~~ .. function:: metrics.toxicity.evaluate(response: str) -> ToxicityResponse Evaluate the toxicity of a response. Request: - :class:`~trustwise.sdk.types.ToxicityRequest` Returns: - :class:`~trustwise.sdk.types.ToxicityResponse` Example response: .. code-block:: json { "labels": [ "identity_hate", "insult", "threat", "obscene", "toxic" ], "scores": [ 0.036089644, 0.06207772, 0.027964465, 0.105483316, 0.3622106 ] }