.. _v4_metrics:

V4 Metrics
==========

.. note::
   **Version 4 is now the default** for Trustwise metrics. It provides enhanced accuracy, better type safety, and improved response structures compared to v3.

   All metrics are available in both synchronous and asynchronous forms. For async, use ``TrustwiseSDKAsync`` and ``await`` the ``evaluate()`` methods.

The SDK provides access to all v4 metrics directly through the ``metrics`` namespace. Each metric provides an ``evaluate()`` function with enhanced type safety and improved response structures. Example usage:

For more details on the metrics, please refer to the `Trustwise Metrics Documentation <https://trustwiseai.github.io/tw-docs/docs/category/understanding-trustwise-metrics>`_.

.. code-block:: python

    from trustwise.sdk import TrustwiseSDK
    from trustwise.sdk.config import TrustwiseConfig
    from trustwise.sdk.metrics.v4.types import ContextChunk

    config = TrustwiseConfig(api_key="your-api-key")
    trustwise = TrustwiseSDK(config)

    # v4 context format
    context = [
        ContextChunk(chunk_text="Paris is the capital of France.", chunk_id="doc:idx:1")
    ]

    # v4 metric calls (now default)
    result = trustwise.metrics.faithfulness.evaluate(
        query="What is the capital of France?",
        response="The capital of France is Paris.",
        context=context
    )

Async example:

.. code-block:: python

    import asyncio
    from trustwise.sdk import TrustwiseSDKAsync
    from trustwise.sdk.config import TrustwiseConfig
    from trustwise.sdk.metrics.v4.types import ContextChunk

    async def main():
        config = TrustwiseConfig(api_key="your-api-key")
        trustwise = TrustwiseSDKAsync(config)
        
        context = [
            ContextChunk(chunk_text="Paris is the capital of France.", chunk_id="doc:idx:1")
        ]
        
        result = await trustwise.metrics.faithfulness.evaluate(
            query="What is the capital of France?",
            response="The capital of France is Paris.",
            context=context
        )

    asyncio.run(main())

Refer to the :doc:`api` for details on each metric's parameters.

.. note::
   Custom types such as ``ContextChunk`` are defined in :mod:`trustwise.sdk.metrics.v4.types`.

Adherence
~~~~~~~~~
.. function:: metrics.adherence.evaluate(policy: str, response: str) -> AdherenceResponse

   Evaluate how well the response follows a given policy or instruction.

   Request:

   - :class:`~trustwise.sdk.metrics.v4.types.AdherenceRequest`

   Returns:

   - :class:`~trustwise.sdk.metrics.v4.types.AdherenceResponse`

   Example response:

     .. code-block:: json

        {
            "score": 85.0
        }

   - ``score``: A float, 0-100, measuring how well the response follows the policy (higher is better adherence)

Answer Relevancy
~~~~~~~~~~~~~~~~
.. function:: metrics.answer_relevancy.evaluate(query: str, response: str) -> AnswerRelevancyResponse

   Evaluate the relevancy of a response to the query.

   Request:

   - :class:`~trustwise.sdk.metrics.v4.types.AnswerRelevancyRequest`

   Returns:

   - :class:`~trustwise.sdk.metrics.v4.types.AnswerRelevancyResponse`

   Example response:

     .. code-block:: json

        {
            "score": 92.0,
            "generated_question": "What is the capital city of France?"
        }

   - ``score``: A float, 0-100, measuring how relevant the response is to the query. Higher score indicates better relevancy.
   - ``generated_question``: The generated question for which the response would be relevant

Clarity
~~~~~~~
.. function:: metrics.clarity.evaluate(text: str) -> ClarityResponse

   The Trustwise Clarity metric measures how easy text is to read. It gives higher scores to writing that contains words which are easier to read, and uses concise, self-contained sentences. It does not measure how well you understand the ideas in the text.

   Request:

   - :class:`~trustwise.sdk.metrics.v4.types.ClarityRequest`

   Returns:

   - :class:`~trustwise.sdk.metrics.v4.types.ClarityResponse`

   Example response:

     .. code-block:: json

        {
            "score": 92.5
        }

   - ``score``: A float, 0-100, measuring how clear and understandable the response is. Higher score indicates better clarity.

Completion
~~~~~~~~~~
.. function:: metrics.completion.evaluate(query: str, response: str) -> CompletionResponse

   Evaluate how well the response completes or follows the query's instruction.

   Request:

   - :class:`~trustwise.sdk.metrics.v4.types.CompletionRequest`

   Returns:

   - :class:`~trustwise.sdk.metrics.v4.types.CompletionResponse`

   Example response:

     .. code-block:: json

        {
            "score": 85.0
        }

    - ``score``: A float, 0-100, measuring how well the response completes the query. Higher score indicates better completion.

Context Relevancy
~~~~~~~~~~~~~~~~~
.. function:: metrics.context_relevancy.evaluate(query: str, context: Context, severity: float = None, include_chunk_scores: bool = None, metadata: dict = None) -> ContextRelevancyResponse

   Evaluate the relevancy of the context to the query.

   Request:

   - :class:`~trustwise.sdk.metrics.v4.types.ContextRelevancyRequest`

   Returns:

   - :class:`~trustwise.sdk.metrics.v4.types.ContextRelevancyResponse`

   Example response:

     .. code-block:: json

        {
            "score": 88.5,
            "scores": [
                {"label": "Circumstances", "score": 0.33},
                {"label": "Claim", "score": 46.34},
                {"label": "Policy", "score": 0.11}
            ]
        }

   - ``score``: A float, 0-100, measuring how relevant the context is to the query. Higher score indicates better relevancy.
   - ``scores``: List of :class:`~trustwise.sdk.metrics.v4.types.ObjectStyleScore` with detailed breakdown by relevancy aspect

Faithfulness
~~~~~~~~~~~~
.. function:: metrics.faithfulness.evaluate(query: str, response: str, context: Context) -> FaithfulnessResponse

   Evaluate the faithfulness of a response against its context.

   Request:

   - :class:`~trustwise.sdk.metrics.v4.types.FaithfulnessRequest`

   Returns:

   - :class:`~trustwise.sdk.metrics.v4.types.FaithfulnessResponse`

   Example response:

     .. code-block:: json

        {
            "score": 99.971924,
            "statements": [
                {
                    "statement": "The capital of France is Paris.",
                    "label": "Safe",
                    "probability": 0.9997192,
                    "sentence_span": [0, 30]
                }
            ]
        }

   - ``score``: A float, 0-100, measuring how faithful the response is to the context (100 is perfect faithfulness)
   - ``statements``: List of extracted atomic statements with their verification status

Formality
~~~~~~~~~
.. function:: metrics.formality.evaluate(text: str) -> FormalityResponse

   Evaluate the formality level of a response.

   Request:

   - :class:`~trustwise.sdk.metrics.v4.types.FormalityRequest`

   Returns:

   - :class:`~trustwise.sdk.metrics.v4.types.FormalityResponse`

   Example response:

     .. code-block:: json

        {
            "score": 75.0,
        }

   - ``score``: A float, 0-100, measuring the overall formality level (100 is very formal)

Helpfulness
~~~~~~~~~~~
.. function:: metrics.helpfulness.evaluate(text: str) -> HelpfulnessResponse

   Evaluate the helpfulness of a response.

   Request:

   - :class:`~trustwise.sdk.metrics.v4.types.HelpfulnessRequest`

   Returns:

   - :class:`~trustwise.sdk.metrics.v4.types.HelpfulnessResponse`

   Example response:

     .. code-block:: json

        {
            "score": 88.0
        }

   - ``score``: A float, 0-100, measuring how helpful the response is (100 is very helpful)

PII Detection
~~~~~~~~~~~~~
.. function:: metrics.pii.evaluate(text: str, allowlist: list[str] = None, blocklist: list[str] = None, categories: list[str] = None) -> PIIResponse

   Detect personally identifiable information in text.

   Request:

   - :class:`~trustwise.sdk.metrics.v4.types.PIIRequest`

   Returns:

   - :class:`~trustwise.sdk.metrics.v4.types.PIIResponse`

   Example response:

     .. code-block:: json

        {
            "pii": [
                {
                    "interval": [0, 5],
                    "string": "Hello",
                    "category": "blocklist"
                }
            ]
        }

   - ``pii``: List of detected PII entities with their locations and categories

Prompt Manipulation
~~~~~~~~~~~~~~~~~~~
.. function:: metrics.prompt_manipulation.evaluate(text: str, severity: int = None) -> PromptManipulationResponse

   Detect potential prompt manipulation attempts including jailbreak, prompt injection, and role play.

   Request:

   - :class:`~trustwise.sdk.metrics.v4.types.PromptManipulationRequest`

   Returns:

   - :class:`~trustwise.sdk.metrics.v4.types.PromptManipulationResponse`

   Example response:

     .. code-block:: json

        {
            "score": 0.85,
            "scores": [
                {"label": "jailbreak", "score": 0.90},
                {"label": "prompt_injection", "score": 0.80},
                {"label": "role_play", "score": 0.85}
            ]
        }

   - ``score``: A float, 0-100, measuring overall prompt manipulation likelihood (higher is more likely)
   - ``scores``: List of :class:`~trustwise.sdk.metrics.v4.types.ObjectStyleScore` with detailed breakdown by manipulation type

Refusal
~~~~~~~
.. function:: metrics.refusal.evaluate(query: str, response: str) -> RefusalResponse

   Evaluate the likelihood that a response is a refusal to answer or comply with the query.

   Request:

   - :class:`~trustwise.sdk.metrics.v4.types.RefusalRequest`

   Returns:

   - :class:`~trustwise.sdk.metrics.v4.types.RefusalResponse`

   Example response:

     .. code-block:: json

        {
            "score": 5.0
        }

   - ``score``: A float, 0-100, measuring the degree of refusal (higher indicates stronger refusal)

Sensitivity
~~~~~~~~~~~
.. function:: metrics.sensitivity.evaluate(text: str, topics: list[str]) -> SensitivityResponse

   Evaluate the sensitivity of a response regarding specific topics.

   Request:

   - :class:`~trustwise.sdk.metrics.v4.types.SensitivityRequest`

   Returns:

   - :class:`~trustwise.sdk.metrics.v4.types.SensitivityResponse`

   Example response:

     .. code-block:: json

        {
            "scores": [
                {"label": "politics", "score": 70.0},
                {"label": "religion", "score": 60.0}
            ]
        }

   - ``scores``: List of :class:`~trustwise.sdk.metrics.v4.types.ObjectStyleScore` with sensitivity scores by topic (0-100, higher indicates stronger presence)

Stability
~~~~~~~~~
.. function:: metrics.stability.evaluate(responses: list[str]) -> StabilityResponse

   Measures how similar the responses are when given the same or similar inputs multiple times. It gives higher scores when responses stay consistent, even if asked by different personas or worded differently. This helps identify if an agent changes its answers unexpectedly.

   Request:

   - :class:`~trustwise.sdk.metrics.v4.types.StabilityRequest`

   Returns:

   - :class:`~trustwise.sdk.metrics.v4.types.StabilityResponse`

   Example response:

     .. code-block:: json

        {
            "min": 75,
            "avg": 85
        }

   - ``min``: An integer, 0-100, measuring the minimum stability between any pair of responses (100 is high similarity)
   - ``avg``: An integer, 0-100, measuring the average stability between all pairs of responses (100 is high similarity)

Simplicity
~~~~~~~~~~
.. function:: metrics.simplicity.evaluate(text: str) -> SimplicityResponse

   Measures how easy it is to understand the words in a text. It gives higher scores to writing that uses common, everyday words instead of special terms or complicated words. Simplicity looks at the words you choose, not how you put them together in sentences.

   Request:

   - :class:`~trustwise.sdk.metrics.v4.types.SimplicityRequest`

   Returns:

   - :class:`~trustwise.sdk.metrics.v4.types.SimplicityResponse`

   Example response:

     .. code-block:: json

        {
            "score": 82.0
        }

   - ``score``: A float, 0-100, measuring how simple the response is. Higher score indicates simpler text.

Tone
~~~~
.. function:: metrics.tone.evaluate(text: str, tones: list[str] = None) -> ToneResponse

   Evaluate the tone of a response.

   Request:

   - :class:`~trustwise.sdk.metrics.v4.types.ToneRequest`

   Returns:

   - :class:`~trustwise.sdk.metrics.v4.types.ToneResponse`

   Example response:

     .. code-block:: json

        {
            "scores": [
                {"label": "neutral", "score": 89.70},
                {"label": "happiness", "score": 6.68},
                {"label": "realization", "score": 2.99}
            ]
        }

   - ``scores``: List of :class:`~trustwise.sdk.metrics.v4.types.ObjectStyleScore` with tone confidence scores (0-100, higher indicates stronger presence)

Toxicity
~~~~~~~~
.. function:: metrics.toxicity.evaluate(text: str, severity: int = None) -> ToxicityResponse

   Evaluate the toxicity of a response.

   Request:

   - :class:`~trustwise.sdk.metrics.v4.types.ToxicityRequest`

   Returns:

   - :class:`~trustwise.sdk.metrics.v4.types.ToxicityResponse`

   Example response:

     .. code-block:: json

        {
            "score": 36.22,
            "scores": [
                {"label": "identity_hate", "score": 3.61},
                {"label": "insult", "score": 6.21},
                {"label": "threat", "score": 2.80},
                {"label": "obscene", "score": 10.55},
                {"label": "toxic", "score": 36.22}
            ]
        }

   - ``score``: A float, 0-100, measuring overall toxicity (higher is more toxic)
   - ``scores``: List of :class:`~trustwise.sdk.metrics.v4.types.ObjectStyleScore` with detailed breakdown by toxicity category

Carbon
~~~~~~
.. function:: metrics.carbon.evaluate(provider: str, region: str, instance_type: str, latency: float | int) -> CarbonResponse

   Evaluate the carbon footprint of AI operations based on provider, instance type, region, and latency.

   Request:

   - :class:`~trustwise.sdk.metrics.v4.types.CarbonRequest`

   Returns:

   - :class:`~trustwise.sdk.metrics.v4.types.CarbonResponse`

   Example response:

     .. code-block:: json

        {
            "carbon": {
                "value": 0.0011949989480068127,
                "unit": "kg_co2e"
            },
            "components": [
                {
                    "component": "operational_gpu",
                    "carbon": {
                        "value": 0.0,
                        "unit": "kg_co2e"
                    }
                },
                {
                    "component": "operational_cpu",
                    "carbon": {
                        "value": 0.00021294669026962343,
                        "unit": "kg_co2e"
                    }
                },
                {
                    "component": "embodied_cpu",
                    "carbon": {
                        "value": 0.0009820522577371892,
                        "unit": "kg_co2e"
                    }
                }
            ]
        }

   - ``carbon``: :class:`~trustwise.sdk.metrics.v4.types.CarbonValue` with total carbon footprint
   - ``components``: List of :class:`~trustwise.sdk.metrics.v4.types.CarbonComponent` with breakdown by component