Voisnap Docs
API Reference

AI Models

Supported LLM, STT, and TTS providers — pricing, model IDs, and per-agent configuration.

AI Models

Voisnap supports multiple AI providers for each stage of the voice pipeline: large language models (LLM), speech-to-text (STT), and text-to-speech (TTS). You choose providers and models per agent.


Large Language Models (LLM)

Configure via agent.llm.provider and agent.llm.model.

OpenAI

ModelIDContextInput $/1M tokensOutput $/1M tokens
GPT-4ogpt-4o128K$5.00$15.00
GPT-4o Minigpt-4o-mini128K$0.15$0.60
GPT-4 Turbogpt-4-turbo128K$10.00$30.00
GPT-3.5 Turbogpt-3.5-turbo16K$0.50$1.50

Anthropic

ModelIDContextInput $/1M tokensOutput $/1M tokens
Claude 3.5 Sonnetclaude-3-5-sonnet-20241022200K$3.00$15.00
Claude 3.5 Haikuclaude-3-5-haiku-20241022200K$0.80$4.00
Claude 3 Opusclaude-3-opus-20240229200K$15.00$75.00
Claude 3 Sonnetclaude-3-sonnet-20240229200K$3.00$15.00
Claude 3 Haikuclaude-3-haiku-20240307200K$0.25$1.25

Google

ModelIDContextInput $/1M tokensOutput $/1M tokens
Gemini 1.5 Progemini-1.5-pro1M$3.50$10.50
Gemini 1.5 Flashgemini-1.5-flash1M$0.075$0.30
Gemini 1.0 Progemini-1.0-pro32K$0.50$1.50

Cohere

ModelIDContextInput $/1M tokensOutput $/1M tokens
Command R+command-r-plus128K$3.00$15.00
Command Rcommand-r128K$0.50$1.50

HuggingFace (Bring Your Own)

Configure any HuggingFace Inference Endpoint:

{
  "llm": {
    "provider": "huggingface",
    "endpointUrl": "https://xyz.endpoints.huggingface.cloud/v1",
    "model": "meta-llama/Llama-3-8B-Instruct",
    "apiKey": "hf_..."
  }
}

Speech-to-Text (STT)

Configure via agent.transcription.provider and agent.transcription.model.

Deepgram

ModelIDLanguages$/minute
Nova-2 (recommended)nova-230+$0.0043
Nova-2 Medicalnova-2-medicalen-US$0.0086
Enhancedenhanced30+$0.0145
Basebase30+$0.0125

AssemblyAI

ModelIDLanguages$/minute
Bestbest17$0.0062
Nanonano17$0.0020

Google Speech-to-Text

ModelIDLanguages$/minute
Latest Longlatest_long125+$0.016
Latest Shortlatest_short125+$0.016
Medical Conversationmedical_conversationen-US$0.078

AWS Transcribe

ModelIDLanguages$/minute
Standardstandard30+$0.024
Medicalmedicalen-US$0.0786

Azure Speech

ModelIDLanguages$/hour
Standardstandard100+$1.00
Customcustomvaries$1.40

Text-to-Speech (TTS)

Configure via agent.voice.provider and agent.voice.voiceId.

Tier$/1K characters
Standard voices$0.012
Professional clones$0.024

Popular voice IDs:

  • EXAVITQu4vr4xnSDxMaL — Bella (female, American)
  • ErXwobaYiN019PkySvjV — Antoni (male, American)
  • MF3mGyEYCl7XYWbV9V6O — Elli (female, American)
  • TxGEqnHWrfWFTfGW9XjX — Josh (male, American)
  • VR6AewLTigWG4xSOukaG — Arnold (male, American)

Google TTS

Tier$/1M characters
Standard$4.00
WaveNet$16.00
Neural2$16.00
Studio$160.00

AWS Polly

Tier$/1M characters
Standard$4.00
Neural (NTTS)$16.00
Long-form$100.00

Azure Cognitive Speech

Tier$/1M characters
Standard$4.00
Neural$16.00
Custom Neural$24.00

Configure providers per agent

curl -X PATCH https://api.voisnap.ai/api/v1/agents/agt_01HXK8Z3MNPQRS \
  -H "Authorization: Bearer vsnp_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "llm": {
      "provider": "anthropic",
      "model": "claude-3-5-sonnet-20241022",
      "temperature": 0.6,
      "maxTokens": 400
    },
    "transcription": {
      "provider": "deepgram",
      "model": "nova-2",
      "language": "en-US",
      "smartFormat": true
    },
    "voice": {
      "provider": "elevenlabs",
      "voiceId": "EXAVITQu4vr4xnSDxMaL",
      "stability": 0.5,
      "similarityBoost": 0.75
    }
  }'
client.agents.update(
    "agt_01HXK8Z3MNPQRS",
    llm={
        "provider": "anthropic",
        "model": "claude-3-5-sonnet-20241022",
        "temperature": 0.6,
        "max_tokens": 400,
    },
    transcription={
        "provider": "deepgram",
        "model": "nova-2",
        "language": "en-US",
    },
    voice={
        "provider": "elevenlabs",
        "voice_id": "EXAVITQu4vr4xnSDxMaL",
        "stability": 0.5,
        "similarity_boost": 0.75,
    }
)

:::tip For lowest latency, use GPT-4o Mini or Gemini 1.5 Flash as the LLM, Deepgram Nova-2 for STT, and ElevenLabs standard voices for TTS. This combination typically achieves under 700ms end-to-end response latency. :::

List available models

GET /api/v1/ai-models

Returns all models currently available on your plan, including real-time pricing.

curl https://api.voisnap.ai/api/v1/ai-models \
  -H "Authorization: Bearer vsnp_live_..."