Known Issues and Limitations
This page outlines current known issues and limitations in our system to ensure transparency and help users better understand potential constraints.
1. Model Input Token Limitations
Some models, such as meta-llama/Llama-2-13b-chat-hf, have maximum input token limits that are smaller than the chunk sizes we use during certain iterations of our optimizations. When these limits are exceeded, the models may error out. This is a known limitation of certain models with lower token limits or stricter input validation.
The following models might fail due to exceeding their input token limits:
| Model Name | Token Limit |
|---|---|
bigcode/santacoder | 2048 |
EleutherAI/gpt-neo-1.3B | 2048 |
Gryphe/MythoMax-L2-13b | 4097 |
Gryphe/MythoMax-L2-13b-Lite | 4097 |
meta-llama/Llama-2-13b-chat-hf | 4097 |
upstage/SOLAR-10.7B-Instruct-v1.0 | 4097 |
meta-llama/Llama-3.2-1B-Instruct | 4096 |
microsoft/Phi-3-mini-4k-instruct | 4096 |
scb10x/scb10x-llama3-1-typhoon2-70b-instruct | 8096 |
scb10x/scb10x-llama3-typhoon-v1-5x-4f316 | 8097 |
scb10x/scb10x-llama3-1-typhoon2-8b-instruct | 8192 |
meta-llama/Meta-Llama-3-8B-Instruct-Turbo | 8192 |
deepseek-ai/DeepSeek-R1-Distill-Llama-70B-free | 8193 |
google/gemma-2-27b-it | 8193 |
google/gemma-2-9b-it | 8193 |
google/gemma-2b-it | 8193 |
meta-llama/Llama-3-8b-chat-hf | 8193 |
meta-llama/Llama-3.3-70B-Instruct-Turbo-Free | 8193 |
meta-llama/Meta-Llama-3-70B-Instruct-Turbo | 8193 |
meta-llama/Meta-Llama-3-8B-Instruct-Lite | 8193 |
scb10x/scb10x-llama3-typhoon-v1-5-8b-instruct | 8193 |
⚠️ To avoid these errors, ensure that you are using models with higher token limits.
2. Hugging Face Models: Token Cost Limitation
For models hosted on Hugging Face (HF), prompt and response token costs are marked as Not Applicable (NA). This is because the cost for HF models is determined by infrastructure usage (e.g., per hour charges) rather than token-based charges. Token-based cost reporting is not applicable for these models.
3. NVIDIA Models: Cost Information Limitation
Cost information for NVIDIA models is unavailable as NVIDIA does not currently provide per-hour or per-token pricing in public records. This limitation may affect your ability to estimate operational costs for NVIDIA-hosted models.