Skip to main content
Version: Next

Known Issues and Limitations

This page outlines current known issues and limitations in our system to ensure transparency and help users better understand potential constraints.

1. Model Input Token Limitations

Some models, such as meta-llama/Llama-2-13b-chat-hf, have maximum input token limits that are smaller than the chunk sizes we use during certain iterations of our optimizations. When these limits are exceeded, the models may error out. This is a known limitation of certain models with lower token limits or stricter input validation.

The following models might fail due to exceeding their input token limits:

Model NameToken Limit
bigcode/santacoder2048
EleutherAI/gpt-neo-1.3B2048
Gryphe/MythoMax-L2-13b4097
Gryphe/MythoMax-L2-13b-Lite4097
meta-llama/Llama-2-13b-chat-hf4097
upstage/SOLAR-10.7B-Instruct-v1.04097
meta-llama/Llama-3.2-1B-Instruct4096
microsoft/Phi-3-mini-4k-instruct4096
scb10x/scb10x-llama3-1-typhoon2-70b-instruct8096
scb10x/scb10x-llama3-typhoon-v1-5x-4f3168097
scb10x/scb10x-llama3-1-typhoon2-8b-instruct8192
meta-llama/Meta-Llama-3-8B-Instruct-Turbo8192
deepseek-ai/DeepSeek-R1-Distill-Llama-70B-free8193
google/gemma-2-27b-it8193
google/gemma-2-9b-it8193
google/gemma-2b-it8193
meta-llama/Llama-3-8b-chat-hf8193
meta-llama/Llama-3.3-70B-Instruct-Turbo-Free8193
meta-llama/Meta-Llama-3-70B-Instruct-Turbo8193
meta-llama/Meta-Llama-3-8B-Instruct-Lite8193
scb10x/scb10x-llama3-typhoon-v1-5-8b-instruct8193

⚠️ To avoid these errors, ensure that you are using models with higher token limits.


2. Hugging Face Models: Token Cost Limitation

For models hosted on Hugging Face (HF), prompt and response token costs are marked as Not Applicable (NA). This is because the cost for HF models is determined by infrastructure usage (e.g., per hour charges) rather than token-based charges. Token-based cost reporting is not applicable for these models.


3. NVIDIA Models: Cost Information Limitation

Cost information for NVIDIA models is unavailable as NVIDIA does not currently provide per-hour or per-token pricing in public records. This limitation may affect your ability to estimate operational costs for NVIDIA-hosted models.


4. Fine-Tuned User Models: Limited Support

We currently lack full support for fine-tuned user models due to the unavailability of tokenizers and specific cost information for these models. While users can utilize their fine-tuned OpenAI models, we may be unable to provide accurate cost estimates, as detailed pricing information for such models is not available.


5. Audio and Vision Model Limitations

Audio and vision models are not supported at this time, regardless of provider. While some of these models may appear in dropdowns and can be registered, any model that generates a non-text output (e.g., audio or image) will cause the scan to fail.

This includes models such as:

  • gpt-4o-audio-preview
  • gpt-4o-realtime-preview
  • meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo
  • meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo
  • And other similar audio/vision models

6. Query Generation: Dependency on Document and Model Quality

The quality of query generation is highly dependent on the quality of the uploaded document and the selected LLM. Documents with incomplete or unclear information may result in suboptimal queries. Similarly, the choice of LLM directly impacts the relevance and accuracy of the generated queries.


7. Scan Failures Due to Unavailable Cost Data (Cost = NA)

When Cost is selected as part of a scan—regardless of the model provider—scans may fail if cost information is not yet available for the selected model. This occurs because our model provider lists are automatically updated to include the latest models, but cost data may lag behind availability.

As a result, if a model’s cost is currently unavailable (NA), the system may treat the scan as failed, especially if cost evaluation is a required part of the scan.

The following models are known to return NA for cost at this time:

Model Name
deepseek-ai/DeepSeek-R1-Distill-Llama-70B
deepseek-ai/DeepSeek-R1-Distill-Llama-70B-free
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
deepseek-ai/DeepSeek-R1-Distill-Qwen-14B

8. Embedding Model Chunk Size Compatibility

When uploading a document, we currently require embedding to succeed at all three chunk sizes: 256, 512, and 1024. However, if the selected embedding model does not support context windows large enough for 1024 tokens, the document upload may fail.

This limitation occurs because some embedding models are unable to process longer chunks due to their internal context length restrictions. If the model fails to embed even one of the required chunk sizes, the entire document upload process will be considered unsuccessful.

⚠️ To avoid this issue, ensure the selected embedding model supports at least a 1024-token context window.


We are actively working to address these limitations where possible and will update this page as new information becomes available.

For any issues that were not addressed, please reach out to our support team at help@trustwise.ai for assistance.