This tutorial shows how to deploy DeepSeek with Ollama, LiteLLM, and OpenWebUI on Ubuntu 24.10. Optional steps are provided to set up with Ngrok and Cline.
This setup provides a beautiful chat UI, API access, third-party API management, and spend tracking.
Many thanks to my Ph.D. student Chenchuan He for testing and debugging our server deployment.
Check Ubuntu version (this tutorial was tested on Ubuntu 24.10):
lsb_release -a
Install python dependencies:
sudo apt install python3 python3-pip python3-venv
Create a virtual environment:
python3 -m venv venv
source venv/bin/activate
Install liteLLM, open-webui, and prisma (used by open-webui).
pip install 'litellm[proxy]'
pip install open-webui
pip install prisma
sudo apt update
curl -fsSL https://ollama.com/install.sh | sh
You can search for models at https://ollama.com/search:
Then, you can pull and run deepseek-r1:7b
:
ollama run deepseek-r1:7b
deepseek-r1
has no default temperature, so the ollama default temperature 0.8 is used:
ollama show --parameters deepseek-r1:7b
stop "<|begin▁of▁sentence|>"
stop "<|end▁of▁sentence|>"
stop "<|User|>"
stop "<|Assistant|>"
If you always want DeepSeek-R1 7B to use the recommended temperature 0.6, create a Modelfile:
Create a new Modelfile:
FROM deepseek-r1:7b
PARAMETER temperature 0.6
Create and run a custom model:
ollama create deepseek-custom -f Modelfile
ollama run deepseek-custom
You can use any model in a similar way, e.g., running mistral 7b model using ollama run mistral
We use LiteLLM to authenticate users and manage API keys. A postgres database is needed (could be local or remote).
Setup environment variables. Note that the master key is the default password for the admin user.
# Set master key for admin authentication and API key management
echo 'export LITELLM_MASTER_KEY="sk-your-master-key"' >> ~/.bashrc
# Set salt key for hashing and security purposes
echo 'export LITELLM_SALT_KEY="sk-your-salt-key"' >> ~/.bashrc
# Set the port for LiteLLM proxy server (default is 8000 if not set)
echo 'export PORT=4000' >> ~/.bashrc # optional
# Reload the shell configuration to apply the new environment variables
source ~/.bashrc
Setup postgres:
# Install PostgreSQL and additional utilities
sudo apt install postgresql postgresql-contrib -y
# Check if PostgreSQL service is running
sudo systemctl status postgresql
# Configure PostgreSQL to start automatically on system boot
sudo systemctl enable postgresql
Change the default postgres user password to postgres
(you can use any password you like):
# Connect to PostgreSQL as the postgres superuser
sudo -u postgres psql
# Set a new password for the postgres user (change 'postgres' to a secure password)
ALTER USER postgres PASSWORD 'postgres';
# Exit the PostgreSQL prompt
\q
Create a database for LiteLLM:
sudo -u postgres psql
CREATE DATABASE litellm;
Create and configure config.yaml
. vim config.yaml
and paste the following content:
model_list:
- model_name: ollama/deepseek-r1:7b
litellm_params:
model: ollama/deepseek-r1:7b
api_base: <http://localhost:11434> # Points to your local Ollama
general_settings:
default_model: ollama/deepseek-r1:7b
master_key: sk-your-master-key # Your master key from .env
# Enable authentication
auth_strategy: master_key
database_url: "postgresql://postgres:postgres@localhost:5432/litellm"
Start LiteLLM in the virtual environment:
litellm --config config.yaml
If you want to run LiteLLM in the background:
# Start LiteLLM in the background, redirect both stdout and stderr to litellm.log
nohup litellm --config config.yaml > litellm.log 2>&1 &
# Find the process ID (PID) of the running LiteLLM server
ps aux | grep litellm
# Stop the LiteLLM server by killing its process (replace PID with actual process ID)
kill -9 PID
Now, LiteLLM is running on port http://localhost:4000 and if your server has an IP address, you can access it on http://your-server-ip:4000:
Go to admin panel and create an API key:
Now, test the API (change the API key and server IP address):
curl http://your-server-ip:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-your-api-key" \
-d '{
"model": "ollama/deepseek-r1:7b",
"messages": [{"role": "user", "content": "tell me a joke with less than 10 words"}]
}'
You should get a response like this (showing deepseek-r1 thinking and its response):
{
"id":"chatcmpl-997fcca1-89cf-42ce-9844-70a47e3b2886",
"created":1740686328,"model":"ollama/deepseek-r1:7b",
"object":"chat.completion",
"system_fingerprint":null,
"choices":[
{
"finish_reason":"stop",
"index":0,
"message":{
"content":"<think>\nAlright, so the user asked for a joke with less than 10 words. I need to come up with something short and sweet.\n\nHmm, maybe a simple pun or wordplay would work well here. Short jokes are tricky because they have to be punchy and clear in just a few words.\n\nLet me think of some common phrases that can be twisted into a joke. \"Why don't skeletons fight?\" is a classic setup. How about adding \"they don’t know how to 'rot'\"? That keeps it under 10 words and adds a bit of humor with the unexpected twist.\n</think>\n\nWhy don’t skeletons fight? \nThey don’t know how to \"rot.\"",
"role":"assistant",
"tool_calls":null,
"function_call":null
}
}
],
"usage":{
"completion_tokens":142,
"prompt_tokens":18,
"total_tokens":160,
"completion_tokens_details":null,
"prompt_tokens_details":null
}
}
Here’s how to make the same API call using Python and LiteLLM:
from litellm import completion
# Configure the API endpoint and key
api_key = "sk-your-api-key"
api_base = "http://your-server-ip:4000"
# Make the API call
response = completion(
model="ollama/deepseek-r1:7b",
messages=[{"role": "user", "content": "tell me a joke with less than 10 words"}],
api_key=api_key,
api_base=api_base
)
# Print the response
print(response.choices[0].message.content)
Start open-webui with:
open-webui serve
or run it in the background:
# Start Open-WebUI in the background and redirect all output to open-webui.log
nohup open-webui &> open-webui.log &
# Find the process ID of the running Open-WebUI server
ps aux | grep open-webui
# Stop the Open-WebUI server (replace PID with the actual process ID)
kill -9 PID
Now open-webui is running on http://localhost:8080. If your server has an IP address, you can access it on http://your-server-ip:8080. Once you setup the initial admin account, you can chat with the model:
If your server is behind a firewall or doesn’t have a public IP address, you can use Ngrok to create secure tunnels to your local services.
# Add Ngrok's package signing key and repository
curl -s https://ngrok-agent.s3.amazonaws.com/ngrok.asc | sudo tee /etc/apt/trusted.gpg.d/ngrok.asc >/dev/null && \
echo "deb https://ngrok-agent.s3.amazonaws.com buster main" | sudo tee /etc/apt/sources.list.d/ngrok.list && \
sudo apt update && sudo apt install ngrok
# Verify Ngrok installation
ngrok --version
# Add your Ngrok authentication token (get this from your Ngrok dashboard)
ngrok config add-authtoken YOUR_NGROK_AUTH_TOKEN
# Create a tunnel to expose LiteLLM API (port 4000) to the internet
# This will provide a public URL that forwards to your local service
# 4000 for LiteLLM, 8080 for Open-WebUI
ngrok http 4000
ngrok http 8080
Now you can use DeepSeek in Cline. Note you have to include the trailing /
in the base URL, such as http://your-server-ip:4000/
or https://xxx.ngrok-free.app/
: