Supported LLM, STT, and TTS providers — pricing, model IDs, and per-agent configuration.

AI Models

Voisnap supports multiple AI providers for each stage of the voice pipeline: large language models (LLM), speech-to-text (STT), and text-to-speech (TTS). You choose providers and models per agent.

Large Language Models (LLM)

Configure via agent.llm.provider and agent.llm.model.

OpenAI

Model	ID	Context	Input $/1M tokens	Output $/1M tokens
GPT-4o	`gpt-4o`	128K	$5.00	$15.00
GPT-4o Mini	`gpt-4o-mini`	128K	$0.15	$0.60
GPT-4 Turbo	`gpt-4-turbo`	128K	$10.00	$30.00
GPT-3.5 Turbo	`gpt-3.5-turbo`	16K	$0.50	$1.50

Anthropic

Model	ID	Context	Input $/1M tokens	Output $/1M tokens
Claude 3.5 Sonnet	`claude-3-5-sonnet-20241022`	200K	$3.00	$15.00
Claude 3.5 Haiku	`claude-3-5-haiku-20241022`	200K	$0.80	$4.00
Claude 3 Opus	`claude-3-opus-20240229`	200K	$15.00	$75.00
Claude 3 Sonnet	`claude-3-sonnet-20240229`	200K	$3.00	$15.00
Claude 3 Haiku	`claude-3-haiku-20240307`	200K	$0.25	$1.25

Google

Model	ID	Context	Input $/1M tokens	Output $/1M tokens
Gemini 1.5 Pro	`gemini-1.5-pro`	1M	$3.50	$10.50
Gemini 1.5 Flash	`gemini-1.5-flash`	1M	$0.075	$0.30
Gemini 1.0 Pro	`gemini-1.0-pro`	32K	$0.50	$1.50

Cohere

Model	ID	Context	Input $/1M tokens	Output $/1M tokens
Command R+	`command-r-plus`	128K	$3.00	$15.00
Command R	`command-r`	128K	$0.50	$1.50

HuggingFace (Bring Your Own)

Configure any HuggingFace Inference Endpoint:

{
  "llm": {
    "provider": "huggingface",
    "endpointUrl": "https://xyz.endpoints.huggingface.cloud/v1",
    "model": "meta-llama/Llama-3-8B-Instruct",
    "apiKey": "hf_..."
  }
}

Speech-to-Text (STT)

Configure via agent.transcription.provider and agent.transcription.model.

Deepgram

Model	ID	Languages	$/minute
Nova-2 (recommended)	`nova-2`	30+	$0.0043
Nova-2 Medical	`nova-2-medical`	en-US	$0.0086
Enhanced	`enhanced`	30+	$0.0145
Base	`base`	30+	$0.0125

AssemblyAI

Model	ID	Languages	$/minute
Best	`best`	17	$0.0062
Nano	`nano`	17	$0.0020

Google Speech-to-Text

Model	ID	Languages	$/minute
Latest Long	`latest_long`	125+	$0.016
Latest Short	`latest_short`	125+	$0.016
Medical Conversation	`medical_conversation`	en-US	$0.078

AWS Transcribe

Model	ID	Languages	$/minute
Standard	`standard`	30+	$0.024
Medical	`medical`	en-US	$0.0786

Azure Speech

Model	ID	Languages	$/hour
Standard	`standard`	100+	$1.00
Custom	`custom`	varies	$1.40

Text-to-Speech (TTS)

Configure via agent.voice.provider and agent.voice.voiceId.

ElevenLabs (recommended for quality)

Tier	$/1K characters
Standard voices	$0.012
Professional clones	$0.024

Popular voice IDs:

EXAVITQu4vr4xnSDxMaL — Bella (female, American)
ErXwobaYiN019PkySvjV — Antoni (male, American)
MF3mGyEYCl7XYWbV9V6O — Elli (female, American)
TxGEqnHWrfWFTfGW9XjX — Josh (male, American)
VR6AewLTigWG4xSOukaG — Arnold (male, American)

Google TTS

Tier	$/1M characters
Standard	$4.00
WaveNet	$16.00
Neural2	$16.00
Studio	$160.00

AWS Polly

Tier	$/1M characters
Standard	$4.00
Neural (NTTS)	$16.00
Long-form	$100.00

Azure Cognitive Speech

Tier	$/1M characters
Standard	$4.00
Neural	$16.00
Custom Neural	$24.00

Configure providers per agent

curl -X PATCH https://api.voisnap.ai/api/v1/agents/agt_01HXK8Z3MNPQRS \
  -H "Authorization: Bearer vsnp_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "llm": {
      "provider": "anthropic",
      "model": "claude-3-5-sonnet-20241022",
      "temperature": 0.6,
      "maxTokens": 400
    },
    "transcription": {
      "provider": "deepgram",
      "model": "nova-2",
      "language": "en-US",
      "smartFormat": true
    },
    "voice": {
      "provider": "elevenlabs",
      "voiceId": "EXAVITQu4vr4xnSDxMaL",
      "stability": 0.5,
      "similarityBoost": 0.75
    }
  }'

client.agents.update(
    "agt_01HXK8Z3MNPQRS",
    llm={
        "provider": "anthropic",
        "model": "claude-3-5-sonnet-20241022",
        "temperature": 0.6,
        "max_tokens": 400,
    },
    transcription={
        "provider": "deepgram",
        "model": "nova-2",
        "language": "en-US",
    },
    voice={
        "provider": "elevenlabs",
        "voice_id": "EXAVITQu4vr4xnSDxMaL",
        "stability": 0.5,
        "similarity_boost": 0.75,
    }
)

:::tip For lowest latency, use GPT-4o Mini or Gemini 1.5 Flash as the LLM, Deepgram Nova-2 for STT, and ElevenLabs standard voices for TTS. This combination typically achieves under 700ms end-to-end response latency. :::

List available models

GET /api/v1/ai-models

Returns all models currently available on your plan, including real-time pricing.

curl https://api.voisnap.ai/api/v1/ai-models \
  -H "Authorization: Bearer vsnp_live_..."

AI Models

AI Models

Large Language Models (LLM)

OpenAI

Anthropic

Google

Cohere

HuggingFace (Bring Your Own)

Speech-to-Text (STT)

Deepgram

AssemblyAI

Google Speech-to-Text

AWS Transcribe

Azure Speech

Text-to-Speech (TTS)

ElevenLabs (recommended for quality)

Google TTS

AWS Polly

Azure Cognitive Speech

Configure providers per agent

List available models

On this page