Skip to main content
Version: v3.1.0

Running a Safety and Alignment Scan

The Trustwise Developer portal allows you to conduct detailed Safety and Alignment scans on your LLMs. Follow these steps to run a scan effectively:

Step 1: Select a Project

  1. Navigate to the Scan Page: You can follow this link to navigate to the scan page.
Project List
  1. Choose a Project: Select the project under which you want to run the scan. If you have not created a project, a default project is automatically created for you.
Select Project

Step 2: Add Scan Title

Enter Scan Title: Provide a title for your scan. This title will be used to save and reference the scan results within your selected project.

Enter Scan Name

Step 3: Select Your LLM

  1. Choose LLM Provider: Select your LLM provider from the dropdown menu. The available providers are based on your previous registrations.
Select LLM
  1. Select LLM: After selecting the provider, another dropdown will populate with the LLMs you have registered under that provider.
  2. API Key: The API key field will automatically be filled with the registered API key for the selected LLM. You may alter this key if necessary.
Select LLM

If you have not yet registered any LLMs, you will have to do so on the LLM Registration page prior to running a scan.

Step 4: Manage Documents

Select/Upload Document: Choose a document from previously uploaded documents for the scan context, or upload a new one. Newly uploaded documents will be saved for future use.

  • Click "Add Document" if you need to upload a new document, and follow the prompts to upload and save it.
Select Document

Uploaded documents are embedded and stored in a Trustwise vector database for efficient and quick future access. This is especially useful when running future scans that require previously used documents as users can easily select them from the document list rather than having to re-upload them.

Step 5: Upload Questions

Add Questions: Enter up to three questions or queries that you want to test against the LLM for Safety and Alignment scanning.

  • Select "Add your questions" to type in the first question.
  • If you wish to add more than 1 question select "Add question" to add more.
Enter Questions

Step 6: Set Safety Parameters

  1. Select Safety Metrics: Choose one, multiple, or all Safety metrics for the scan. Available options include Faithfulness, Summarization, Context Relevancy, and Answer Relevancy.
  2. Hallucination Classifications: If any Safety metrics are selected, you must also select a hallucinations classification library from the dropdown menu. Available options include Industry Agnostic, Healthcare Claims, Financial Services, Life Sciences, and Medical Industry.

Step 7: Set Alignment Parameters

Select Alignment Metrics: Choose one, more, or all Alignment metrics to include in the scan. Metrics include Tone, Clarity, Formality, Helpfulness, Simplicity, and Toxicity.

Select Metrics

Step 8: Run the Scan

Execute the Scan: Once all parameters are set, click the "Run" button to start the scan. The results will be processed and displayed under the scan title you provided.

Scan Results

Troubleshooting and Support

If you encounter any issues or have questions during the scan setup or execution process, please refer to the help documentation or contact our support team at support@trustwise.ai for assistance.


Safety Metrics

Summarization

Summarization evaluates the response using the context and aims to determine whether the response is factually consistent with respect to the provided context. The response is split into individual sentences and all (context, sentence) pairs are passed through a fine-tuned model, determining whether the contents of the sentence are supported by the provided context. This process results in a Summarization metric, scored between 0 and 100, with higher scores indicating that more of the claims in the response are supported by the context.

Faithfulness

Faithfulness evaluates the response using the context and aims to determine whether the response is free from any false statements based on the provided context. The response is broken down into ‘atomic’ facts, which contain only one piece of information. Each of these facts is then verified with respect to the context. This process results in a Faithfulness metric, scored between 0 and 100, with higher scores indicating that a higher proportion of the claims in the response are supported by the context.

Context Relevancy

Context Relevancy evaluates the context using the query and aims to assess whether the context contains all the information necessary to answer the query. First, the key topics discussed in the query are identified. Then, each context chunk is evaluated to determine whether the topics are present. This is then aggregated into a score between 0 and 100, with higher scores indicating that more of the topics present in the query are discussed in the context.

Answer Relevancy

Answer Relevancy evaluates the response, using the query and aims to determine whether the response includes an attempt to answer the question, while being free from superfluous information. The response is used to generate a question for which the response would be suitable. The actual and generated questions are then compared, resulting in a similarity score between 0 and 100. A higher score means the response is more likely to have answered the correct query.

Hallucination Classes

Hallucination Classes are defined as the type of hallucination detected if the trustwise safety metrics are below an optimal value.

Hallucination Classes are segregated into five categories:

  • Industry Agnostic
  • Healthcare Claims
  • Financial Services
  • Life Sciences
  • Medical Industry

Each category represents a specific industry vertical, with the exception of Industry Agnostic which applies to all use cases.


Alignment Metrics

Tone

Tone evaluates the emotional content of text by determining which three emotions are most strongly present. The assessment uses a fine-tuned model to analyze the text against a broad spectrum of emotional examples. Each of the three detected emotions is scored between 0 and 100, where a higher score indicates a stronger presence of the corresponding emotion. This helps in understanding the emotional impact of the text.

Clarity

Clarity focuses on how easy the text is to read, considering its grammar and structure. The clarity of the text is quantified into a score from 0 to 100, with higher scores indicating that the text is easier to read. This metric is crucial for ensuring that the text is understandable without ambiguity, making it accessible to a broader audience.

Formality

Formality measures the level of formality in text. The text is segmented into sentences, each of which is encoded and analyzed by a fine-tuned, topic-classifier model that compares it to a wide range of texts varying in formality. Each sentence receives a score, contributing to an overall formality score between 0 and 100, with higher scores indicating a more formal tone. This metric is vital for ensuring text suitability in professional or casual contexts.

Helpfulness

Helpfulness evaluates the effectiveness of a response in addressing the query posed. It considers the relevance, detail, and usefulness of the information provided in the response. Both the query and the response are analyzed using a fine-tuned model that benchmarks them against examples of helpful and unhelpful responses. Scores range from 0 to 100, with higher scores indicating a more helpful response.

Simplicity

Simplicity assesses the ease of understanding the text. It analyzes the complexity of the vocabulary used by breaking the text into tokens and evaluating the rarity of each token within common literature. The overall simplicity score ranges from 0 to 100, where higher scores signify that the text uses simpler, more common words and phrases, making it easier to comprehend.

Toxicity

Toxicity determines the presence of harmful or offensive content within the text. Utilizing a collection of fine-tuned models, each trained to recognize different forms of toxicity, the text is scored on its overall toxicity and on specific five toxic styles. Scores range from 0 to 100 for each style, with higher scores indicating more pronounced toxicity. This metric is essential for moderating content.


Document Management Guide

The Document Management section of your portal allows you to store and manage all the files needed as contexts for future scans. Uploaded documents are embedded and stored in the Trustwise vector database for efficient and quick future access. You can upload, edit, and manage your documents efficiently using this interface.

Accessing Document Management

Navigate to the Document Management page from your account dashboard or through the main menu under "Account Settings". Here, you will find all your saved documents listed with options for management.

Document Management Page

Uploading New Documents

To add new documents to your account, follow these steps:

Single Document Upload

  1. Open the Add Document Interface: Click the "Add Document" button at the top right of the Document Management page.
Add Document
  1. Enter Document Name: In the "Document Name" field, type a name for your document. This name will be used to reference the document in future operations.
  2. Select Your Files:
    • Drag & Drop: You can drag and drop files directly into the upload area.
    • Browse Files: Click on "Browse files" to select files from your computer.
Enter Doc Name and Attach Doc
  1. Upload: After selecting your files, click the "Upload" button to save the document under the entered name.
Registered Document

Multiple Documents Upload

  • When uploading multiple files, each document will automatically be saved under its original file name.
  • Follow the same steps as for single document upload, but skip entering a document name. The system will use the file names as document names.

Managing Existing Documents

Each document in the list has an "Action" button on the right side:

  • Edit Name: Click this to change the document's name.
  • Delete: Remove the document from your storage.

Document Usage in Scans

Documents uploaded and managed here will appear as options in a dropdown menu on the Scan page:

  • When running a scan, you can select any of the stored documents as the context for the scan.

Adding Documents Through the Scan Page

You can also add new documents directly on the Scan page during the setup of a new scan. These documents will be saved in the Document Management system for future use.

Data Storage Details

Documents are not only stored but also embedded, and their vector indices are maintained in the Trustwise vector database. This system utilizes Qdrant technology to manage document vectors, facilitating efficient retrieval and usage in future scans. This approach enhances the accessibility of stored documents, enabling precise and quick context retrieval for LLM scans.

Note

Ensure that the documents you upload are correctly formatted and relevant to the scans you intend to run. Proper naming and organization will help streamline your scanning processes.

For any assistance with document management, please contact our support team at support@trustwise.ai.