Known Issues and Limitations
This page outlines current known issues and limitations in our system to ensure transparency and help users better understand potential constraints.
1. Model Input Token Limitations
Some models, such as meta-llama/Llama-2-13b-chat-hf, have maximum input token limits that are smaller than the chunk sizes we use during certain iterations of our optimizations. When these limits are exceeded, the models may error out. This is a known limitation of certain models with lower token limits or stricter input validation.
The following models might fail due to exceeding their input token limits:
Model Name | Token Limit |
---|---|
bigcode/santacoder | 2048 |
EleutherAI/gpt-neo-1.3B | 2048 |
Gryphe/MythoMax-L2-13b | 4097 |
Gryphe/MythoMax-L2-13b-Lite | 4097 |
meta-llama/Llama-2-13b-chat-hf | 4097 |
upstage/SOLAR-10.7B-Instruct-v1.0 | 4097 |
meta-llama/Llama-3.2-1B-Instruct | 4096 |
microsoft/Phi-3-mini-4k-instruct | 4096 |
scb10x/scb10x-llama3-1-typhoon2-70b-instruct | 8096 |
scb10x/scb10x-llama3-typhoon-v1-5x-4f316 | 8097 |
scb10x/scb10x-llama3-1-typhoon2-8b-instruct | 8192 |
meta-llama/Meta-Llama-3-8B-Instruct-Turbo | 8192 |
deepseek-ai/DeepSeek-R1-Distill-Llama-70B-free | 8193 |
google/gemma-2-27b-it | 8193 |
google/gemma-2-9b-it | 8193 |
google/gemma-2b-it | 8193 |
meta-llama/Llama-3-8b-chat-hf | 8193 |
meta-llama/Llama-3.3-70B-Instruct-Turbo-Free | 8193 |
meta-llama/Meta-Llama-3-70B-Instruct-Turbo | 8193 |
meta-llama/Meta-Llama-3-8B-Instruct-Lite | 8193 |
scb10x/scb10x-llama3-typhoon-v1-5-8b-instruct | 8193 |
⚠️ To avoid these errors, ensure that you are using models with higher token limits.
2. Hugging Face Models: Token Cost Limitation
For models hosted on Hugging Face (HF), prompt and response token costs are marked as Not Applicable (NA). This is because the cost for HF models is determined by infrastructure usage (e.g., per hour charges) rather than token-based charges. Token-based cost reporting is not applicable for these models.
3. NVIDIA Models: Cost Information Limitation
Cost information for NVIDIA models is unavailable as NVIDIA does not currently provide per-hour or per-token pricing in public records. This limitation may affect your ability to estimate operational costs for NVIDIA-hosted models.
4. Fine-Tuned User Models: Limited Support
We currently lack full support for fine-tuned user models due to the unavailability of tokenizers and specific cost information for these models. While users can utilize their fine-tuned OpenAI models, we may be unable to provide accurate cost estimates, as detailed pricing information for such models is not available.
5. Audio and Vision Model Limitations
Audio and vision models are not supported at this time, regardless of provider. While some of these models may appear in dropdowns and can be registered, any model that generates a non-text output (e.g., audio or image) will cause the scan to fail.
This includes models such as:
gpt-4o-audio-preview
gpt-4o-realtime-preview
meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo
meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo
- And other similar audio/vision models
6. Query Generation: Dependency on Document and Model Quality
The quality of query generation is highly dependent on the quality of the uploaded document and the selected LLM. Documents with incomplete or unclear information may result in suboptimal queries. Similarly, the choice of LLM directly impacts the relevance and accuracy of the generated queries.
7. Scan Failures Due to Unavailable Cost Data (Cost = NA)
When Cost is selected as part of a scan—regardless of the model provider—scans may fail if cost information is not yet available for the selected model. This occurs because our model provider lists are automatically updated to include the latest models, but cost data may lag behind availability.
As a result, if a model’s cost is currently unavailable (NA
), the system may treat the scan as failed, especially if cost evaluation is a required part of the scan.
The following models are known to return NA
for cost at this time:
Model Name |
---|
deepseek-ai/DeepSeek-R1-Distill-Llama-70B |
deepseek-ai/DeepSeek-R1-Distill-Llama-70B-free |
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B |
deepseek-ai/DeepSeek-R1-Distill-Qwen-14B |
8. Embedding Model Chunk Size Compatibility
When uploading a document, we currently require embedding to succeed at all three chunk sizes: 256
, 512
, and 1024
. However, if the selected embedding model does not support context windows large enough for 1024 tokens, the document upload may fail.
This limitation occurs because some embedding models are unable to process longer chunks due to their internal context length restrictions. If the model fails to embed even one of the required chunk sizes, the entire document upload process will be considered unsuccessful.
⚠️ To avoid this issue, ensure the selected embedding model supports at least a 1024-token context window.
We are actively working to address these limitations where possible and will update this page as new information becomes available.
For any issues that were not addressed, please reach out to our support team at help@trustwise.ai
for assistance.