Safety and Alignment Scan Results
Overview
The Trustwise Developer portal provides a comprehensive overview of LLM scans across various projects. This documentation is designed to guide users through each step of accessing and analyzing the results of their LLM scans, from the project level down to individual queries.
1. Projects List Screen
Purpose
The Projects List Screen, the home screen, serves as the gateway to all your LLM scan projects, providing a high-level summary and quick access to each project's detailed analytics.

Features
- Project Overview: Lists all projects, detailing project name, visibility, number of scans, last scan date, and project owner.
- Access Control: Enables users to efficiently search projects based on various criteria.
Navigation
- View Project: Clicking on any project name and then "More Details" accesses the Project Details Screen, offering in-depth details and aggregated scan metrics.

2. Project Details Screen
Purpose
This screen expands on a selected project, displaying detailed metrics and a list of all scans associated with the project.

Features
- Project Metrics: Shows total scans, visibility, ownership, and aggregated results data.
- Scan Summaries: Lists individual scans with their corresponding query counts and agregated Safety and Alignment metrics. These metrics include Summarization, Faithfulness, Context Relevancy, Answer Relevancy, Tone, Formality, Clarity, Simplicity, Helpfulness, and Toxicity.
Navigation
- Individual Scan Review: Selecting a scan navigates to the Scan Summary Screen for detailed results and deeper analysis.
3. Scan Summary Screen
Purpose
Offers a comprehensive summary of a specific scan, including aggregate metrics and detailed listings of all processed queries.

Details
- Aggregate Metrics: Displays values for key performance indicators such as latency, time elapsed, Trustwise score, and agregated Safety and Alignment metrics.
- Query Performance: Each query is listed with its details such as query response and LLM name, as well as specific metrics, allowing users to assess individual performance aspects. These include Trustwise score, Summarization, Faithfulness, Context Relevancy, Answer Relevancy, Tone, Formality. Clarity, Simplicity, Helpfulness, and Toxicity.
Navigation
- Detailed Query Analysis: Selecting a query leads to the Query Overview Screen, which displays detailed input and output data along with specific metric scores.
4. Query Overview Screen
Purpose
Delivers an exhaustive analysis of a single query, including the LLM's response and associated performance metrics.
Contents
- Query and Response: Displays the original query alongside the LLM’s response.
- Safety Metrics: Displays the scores for each of the Safety metrics: Faithfulness, Context Relevancy, Answer Relevancy, and Summarization.
- Alignment Metrics: Provides detailed scores for each of the Alignemnt metrics - Tone, Formality. Clarity, Simplicity, Helpfulness, and Toxicity. For Tone, the screen displays the three most relevant emotional tones identified in the response, each with its respective score. Similarly, all six Toxicity styles (severe toxicity, identity hate, threat, obscene, insult, and general toxicity) are displayed with their corresponding scores, providing a comprehensive view of the content's suitability and engagement style.

- Metadata: Provides the start and end times of the query execution, offering insights into the performance and efficiency of the model during the interaction.
- Model Details and Token Usage: Details about the model used, prompt tokens, completion tokens, and total tokens used during the query.
- Prompt Tokens refer to the segments of text fed into the model to initiate and guide the generation process. These tokens set the context and specify the content that the model needs to consider for generating a response.
- Completion Tokens refer to the segments of text generated by the model in response to the prompt. These tokens are the model's output and form the answer or continuation of the text based on the prompt provided.

- Query Trace: Provides the latencies for every step of the querying process.

More detailed information about significance of Safety and Alignment metrics can be found below.
Charts Page Overview
Overview
The Charts Page in the Trustwise AI portal serves as a powerful analytical tool designed to display detailed performance metrics across various projects and scans. It provides dynamic visualizations of Safety, Alignment, and non-functional requirements (NFR) metrics to facilitate in-depth analysis and optimization of LLM operations.

Key Features
Metric Visualizations
-
Safety and Alignment Metrics: Visualize performance across multiple scans within a project using box plots for metrics such as Summarization, Faithfulness, Context Relevancy, Clarity, Formality, Helpfulness, Simplicity and Toxicity. If a scan comprises multiple queries, these metrics are displayed as box plots. For scans containing only one query, the metric appears as a single line or point.
-
Non-Functional Requirements (NFR) Metrics: Track system efficiency with line graphs depicting Average Scan Time and Average Response Latency in ms, providing insights into the operational performance of the system.
Navigation and Filters
- Project and Scan Selection: Charts are automatically filtered by the last opened project. The page allows users to switch between different projects or specific scans within a project through dropdown menus.
- Date and LLM Filters: Charts can be tailored by specifying date ranges or selecting specific LLMs used within the project.
- Reset Filters: A 'Clear All Filters' option enables users to remove all applied filters and view the data without restrictions.
Hovering Over Visual Elements
Enhance your analytical experience by hovering over elements within the charts for detailed metric insights:
Box Plots:
- Scan Name: Identifies the scan corresponding to the data.
- Minimum Value: Displays the lowest metric score in the dataset.
- First Quartile (Q1): Shows the 25th percentile score.
- Median (Q2): Indicates the median of the dataset.
- Third Quartile (Q3): Shows the 75th percentile score.
- Maximum Value: Displays the highest metric score in the dataset.
Line Graphs (Average Scan Time and Response Latency):
- Scan Name: Identifies the scan corresponding to the data.
- Exact Value: Displays the specific measurement at the selected point.
Safety Metrics Details
Summarization
Summarization evaluates the response using the context and aims to determine whether the response is factually consistent with respect to the provided context. The response is split into individual sentences and all (context, sentence) pairs are passed through a fine-tuned model, determining whether the contents of the sentence are supported by the provided context. This process results in a Summarization metric, scored between 0 and 100, with higher scores indicating that more of the claims in the response are supported by the context.
Faithfulness
Faithfulness evaluates the response using the context and aims to determine whether the response is free from any false statements based on the provided context. The response is broken down into ‘atomic’ facts, which contain only one piece of information. Each of these facts is then verified with respect to the context. This process results in a Faithfulness metric, scored between 0 and 100, with higher scores indicating that a higher proportion of the claims in the response are supported by the context.
Context Relevancy
Context Relevancy evaluates the context using the query and aims to assess whether the context contains all the information necessary to answer the query. First, the key topics discussed in the query are identified. Then, each context chunk is evaluated to determine whether the topics are present. This is then aggregated into a score between 0 and 100, with higher scores indicating that more of the topics present in the query are discussed in the context.
Answer Relevancy
Answer Relevancy evaluates the response, using the query and aims to determine whether the response includes an attempt to answer the question, while being free from superfluous information. The response is used to generate a question for which the response would be suitable. The actual and generated questions are then compared, resulting in a similarity score between 0 and 100. A higher score means the response is more likely to have answered the correct query.
Hallucination Classes
Hallucination Classes: Industry Agnostic Healthcare Claims Financial Services Life Sciences Medical Industry
Alignment Metrics Details
Tone
Tone evaluates the emotional content of text by determining which three emotions are most strongly present. The assessment uses a fine-tuned model to analyze the text against a broad spectrum of emotional examples. Each of the three detected emotions is scored between 0 and 100, where a higher score indicates a stronger presence of the corresponding emotion. This helps in understanding the emotional impact of the text.
Clarity
Clarity focuses on how easy the text is to read, considering its grammar and structure. The clarity of the text is quantified into a score from 0 to 100, with higher scores indicating that the text is easier to read. This metric is crucial for ensuring that the text is understandable without ambiguity, making it accessible to a broader audience.
Formality
Formality measures the level of formality in text. The text is segmented into sentences, each of which is encoded and analyzed by a fine-tuned, topic-classifier model that compares it to a wide range of texts varying in formality. Each sentence receives a score, contributing to an overall formality score between 0 and 100, with higher scores indicating a more formal tone. This metric is vital for ensuring text suitability in professional or casual contexts.
Helpfulness
Helpfulness evaluates the effectiveness of a response in addressing the query posed. It considers the relevance, detail, and usefulness of the information provided in the response. Both the query and the response are analyzed using a fine-tuned model that benchmarks them against examples of helpful and unhelpful responses. Scores range from 0 to 100, with higher scores indicating a more helpful response.
Simplicity
Simplicity assesses the ease of understanding the text. It analyzes the complexity of the vocabulary used by breaking the text into tokens and evaluating the rarity of each token within common literature. The overall simplicity score ranges from 0 to 100, where higher scores signify that the text uses simpler, more common words and phrases, making it easier to comprehend.
Toxicity
Toxicity determines the presence of harmful or offensive content within the text. Utilizing a collection of fine-tuned models, each trained to recognize different forms of toxicity, the text is scored on its overall toxicity and on specific five toxic styles. Scores range from 0 to 100 for each style, with higher scores indicating more pronounced toxicity. This metric is essential for moderating content.
Additional Information
For further assistance or to report issues with the Trustwise AI Portal, please contact our support team at support@trustwise.ai
.