This repository documents how to connect a SambaNova LLM server as a custom LLM to Vapi using SambaNova’s Meta-Llama-3.3-70B-Instruct model. The guide walks you through setting up a local Flask server, exposing it with Ngrok, configuring Vapi Custom LLM, and understanding the end-to-end communication flow.
This setup is useful for:
- Testing custom LLM logic locally
- Adding middleware, logging, or prompt control
- Running your own inference or proxy layer behind Vapi
Before starting, make sure you have the following:
- SambaNova API Key - Access to SamabaNova's LLMs. For that, please visit the SambaNova Cloud page
- Vapi Account – Access to the Vapi Dashboard. For that, create a Vapi account here
- Python 3.11+ – Local development environment
- Python dependencies:
pip install flask sambanova- Ngrok – To expose your local server to the internet. For installation, please run the following in MacOS. For more information, go here.
brew install ngrokThen, get your ngrok auth token and add it with the following. For more information, follow this:
ngrok config add-authtoken $YOUR_NGROK_AUTHTOKEN- Flask App Code – Vapi server-side example here
Use the file called app.py here, which forwards incoming chat requests to a SambaNova-hosted LLM using SambaNova's SDK. It accepts standard chat parameters, cleans up Vapi-specific field structure from the request, and then either streams tokens back to the client using Server-Sent Events or returns a full JSON response in one shot.
python app.pyThe server will start on:
http://localhost:5000
In a separate terminal:
ngrok http 5000Ngrok will generate a public URL similar to:
https://abcd-1234.ngrok-free.dev
This is the endpoint Vapi will call.
Test your endpoint with a cURL like the following
curl -X POST https://abcd-1234.ngrok-free.dev/chat/completions \
-H "Content-Type: application/json" \
-d '{
"call": "chat.completions",
"metadata": {
"request_id": "example-123"
},
"model": "Meta-Llama-3.3-70B-Instruct",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello! Explain what an LLM is in one sentence."
}
],
"temperature": 0.7,
"max_tokens": 150,
"stream": true
}'
-
Log in to the Vapi Dashboard
-
Create an Assistant with a Blank Template
-
Navigate to Model → Provider → Custom LLM
-
Introduct the Model name you'll use (
Meta-Llama-3.3-70B-Instruct) -
Paste your Ngrok URL into the endpoint URL field
https://abcd-1234.ngrok-free.dev/chat/completions -
Save the configuration
- Send a test message using the Chat or Talk to Assistant options from Vapi
- Confirm the request reaches your local Flask server
- Verify the response is returned and displayed correctly in Vapi
- User sends a message in Vapi
- Vapi sends a POST request to your Ngrok endpoint
- Flask server receives the request
- Conversation data is parsed and transformed
- SambaNova API is called (
Meta-Llama-3.3-70B-Instruct) - Response is formatted for Vapi
- Vapi displays the response to the user
- Ngrok URLs change on restart (unless using a paid plan)
- Use environment variables for secrets
- Validate request payloads from Vapi
- Add logging for debugging and observability
- Follow the official Vapi response schema strictly
Happy building 🚀

