AI Models
Supported LLM, STT, and TTS providers — pricing, model IDs, and per-agent configuration.
AI Models
Voisnap supports multiple AI providers for each stage of the voice pipeline: large language models (LLM), speech-to-text (STT), and text-to-speech (TTS). You choose providers and models per agent.
Large Language Models (LLM)
Configure via agent.llm.provider and agent.llm.model.
OpenAI
| Model | ID | Context | Input $/1M tokens | Output $/1M tokens |
|---|---|---|---|---|
| GPT-4o | gpt-4o | 128K | $5.00 | $15.00 |
| GPT-4o Mini | gpt-4o-mini | 128K | $0.15 | $0.60 |
| GPT-4 Turbo | gpt-4-turbo | 128K | $10.00 | $30.00 |
| GPT-3.5 Turbo | gpt-3.5-turbo | 16K | $0.50 | $1.50 |
Anthropic
| Model | ID | Context | Input $/1M tokens | Output $/1M tokens |
|---|---|---|---|---|
| Claude 3.5 Sonnet | claude-3-5-sonnet-20241022 | 200K | $3.00 | $15.00 |
| Claude 3.5 Haiku | claude-3-5-haiku-20241022 | 200K | $0.80 | $4.00 |
| Claude 3 Opus | claude-3-opus-20240229 | 200K | $15.00 | $75.00 |
| Claude 3 Sonnet | claude-3-sonnet-20240229 | 200K | $3.00 | $15.00 |
| Claude 3 Haiku | claude-3-haiku-20240307 | 200K | $0.25 | $1.25 |
| Model | ID | Context | Input $/1M tokens | Output $/1M tokens |
|---|---|---|---|---|
| Gemini 1.5 Pro | gemini-1.5-pro | 1M | $3.50 | $10.50 |
| Gemini 1.5 Flash | gemini-1.5-flash | 1M | $0.075 | $0.30 |
| Gemini 1.0 Pro | gemini-1.0-pro | 32K | $0.50 | $1.50 |
Cohere
| Model | ID | Context | Input $/1M tokens | Output $/1M tokens |
|---|---|---|---|---|
| Command R+ | command-r-plus | 128K | $3.00 | $15.00 |
| Command R | command-r | 128K | $0.50 | $1.50 |
HuggingFace (Bring Your Own)
Configure any HuggingFace Inference Endpoint:
{
"llm": {
"provider": "huggingface",
"endpointUrl": "https://xyz.endpoints.huggingface.cloud/v1",
"model": "meta-llama/Llama-3-8B-Instruct",
"apiKey": "hf_..."
}
}
Speech-to-Text (STT)
Configure via agent.transcription.provider and agent.transcription.model.
Deepgram
| Model | ID | Languages | $/minute |
|---|---|---|---|
| Nova-2 (recommended) | nova-2 | 30+ | $0.0043 |
| Nova-2 Medical | nova-2-medical | en-US | $0.0086 |
| Enhanced | enhanced | 30+ | $0.0145 |
| Base | base | 30+ | $0.0125 |
AssemblyAI
| Model | ID | Languages | $/minute |
|---|---|---|---|
| Best | best | 17 | $0.0062 |
| Nano | nano | 17 | $0.0020 |
Google Speech-to-Text
| Model | ID | Languages | $/minute |
|---|---|---|---|
| Latest Long | latest_long | 125+ | $0.016 |
| Latest Short | latest_short | 125+ | $0.016 |
| Medical Conversation | medical_conversation | en-US | $0.078 |
AWS Transcribe
| Model | ID | Languages | $/minute |
|---|---|---|---|
| Standard | standard | 30+ | $0.024 |
| Medical | medical | en-US | $0.0786 |
Azure Speech
| Model | ID | Languages | $/hour |
|---|---|---|---|
| Standard | standard | 100+ | $1.00 |
| Custom | custom | varies | $1.40 |
Text-to-Speech (TTS)
Configure via agent.voice.provider and agent.voice.voiceId.
ElevenLabs (recommended for quality)
| Tier | $/1K characters |
|---|---|
| Standard voices | $0.012 |
| Professional clones | $0.024 |
Popular voice IDs:
EXAVITQu4vr4xnSDxMaL— Bella (female, American)ErXwobaYiN019PkySvjV— Antoni (male, American)MF3mGyEYCl7XYWbV9V6O— Elli (female, American)TxGEqnHWrfWFTfGW9XjX— Josh (male, American)VR6AewLTigWG4xSOukaG— Arnold (male, American)
Google TTS
| Tier | $/1M characters |
|---|---|
| Standard | $4.00 |
| WaveNet | $16.00 |
| Neural2 | $16.00 |
| Studio | $160.00 |
AWS Polly
| Tier | $/1M characters |
|---|---|
| Standard | $4.00 |
| Neural (NTTS) | $16.00 |
| Long-form | $100.00 |
Azure Cognitive Speech
| Tier | $/1M characters |
|---|---|
| Standard | $4.00 |
| Neural | $16.00 |
| Custom Neural | $24.00 |
Configure providers per agent
curl -X PATCH https://api.voisnap.ai/api/v1/agents/agt_01HXK8Z3MNPQRS \
-H "Authorization: Bearer vsnp_live_..." \
-H "Content-Type: application/json" \
-d '{
"llm": {
"provider": "anthropic",
"model": "claude-3-5-sonnet-20241022",
"temperature": 0.6,
"maxTokens": 400
},
"transcription": {
"provider": "deepgram",
"model": "nova-2",
"language": "en-US",
"smartFormat": true
},
"voice": {
"provider": "elevenlabs",
"voiceId": "EXAVITQu4vr4xnSDxMaL",
"stability": 0.5,
"similarityBoost": 0.75
}
}'
client.agents.update(
"agt_01HXK8Z3MNPQRS",
llm={
"provider": "anthropic",
"model": "claude-3-5-sonnet-20241022",
"temperature": 0.6,
"max_tokens": 400,
},
transcription={
"provider": "deepgram",
"model": "nova-2",
"language": "en-US",
},
voice={
"provider": "elevenlabs",
"voice_id": "EXAVITQu4vr4xnSDxMaL",
"stability": 0.5,
"similarity_boost": 0.75,
}
)
:::tip For lowest latency, use GPT-4o Mini or Gemini 1.5 Flash as the LLM, Deepgram Nova-2 for STT, and ElevenLabs standard voices for TTS. This combination typically achieves under 700ms end-to-end response latency. :::
List available models
GET /api/v1/ai-models
Returns all models currently available on your plan, including real-time pricing.
curl https://api.voisnap.ai/api/v1/ai-models \
-H "Authorization: Bearer vsnp_live_..."