February 27, 2025

Deploying DeepSeek with Ollama + LiteLLM + OpenWebUI

deployment

deepseek

ollama

This tutorial shows how to deploy DeepSeek with Ollama, LiteLLM, and OpenWebUI on Ubuntu 24.10. Optional steps are provided to set up with Ngrok and Cline.

This setup provides a beautiful chat UI, API access, third-party API management, and spend tracking.

Many thanks to my Ph.D. student Chenchuan He for testing and debugging our server deployment.

Ubuntu and Packages

Check Ubuntu version (this tutorial was tested on Ubuntu 24.10):

lsb_release -a

Install python dependencies:

sudo apt install python3 python3-pip python3-venv

Create a virtual environment:

python3 -m venv venv
source venv/bin/activate

Install liteLLM, open-webui, and prisma (used by open-webui).

pip install 'litellm[proxy]'
pip install open-webui
pip install prisma

Ollama + DeepSeek

sudo apt update
curl -fsSL https://ollama.com/install.sh | sh

You can search for models at https://ollama.com/search:

Then, you can pull and run deepseek-r1:7b:

ollama run deepseek-r1:7b

deepseek-r1 has no default temperature, so the ollama default temperature 0.8 is used:

ollama show --parameters deepseek-r1:7b
stop                           "<｜begin▁of▁sentence｜>"
stop                           "<｜end▁of▁sentence｜>"
stop                           "<｜User｜>"
stop                           "<｜Assistant｜>"

If you always want DeepSeek-R1 7B to use the recommended temperature 0.6, create a Modelfile:

Create a new Modelfile:

FROM deepseek-r1:7b
PARAMETER temperature 0.6

Create and run a custom model:

ollama create deepseek-custom -f Modelfile
ollama run deepseek-custom

You can use any model in a similar way, e.g., running mistral 7b model using ollama run mistral

LiteLLM

We use LiteLLM to authenticate users and manage API keys. A postgres database is needed (could be local or remote).

Setup environment variables. Note that the master key is the default password for the admin user.

# Set master key for admin authentication and API key management
echo 'export LITELLM_MASTER_KEY="sk-your-master-key"' >> ~/.bashrc

# Set salt key for hashing and security purposes
echo 'export LITELLM_SALT_KEY="sk-your-salt-key"' >> ~/.bashrc

# Set the port for LiteLLM proxy server (default is 8000 if not set)
echo 'export PORT=4000' >> ~/.bashrc  # optional

# Reload the shell configuration to apply the new environment variables
source ~/.bashrc

Setup postgres:

# Install PostgreSQL and additional utilities
sudo apt install postgresql postgresql-contrib -y

# Check if PostgreSQL service is running
sudo systemctl status postgresql

# Configure PostgreSQL to start automatically on system boot
sudo systemctl enable postgresql

Change the default postgres user password to postgres (you can use any password you like):

# Connect to PostgreSQL as the postgres superuser
sudo -u postgres psql

# Set a new password for the postgres user (change 'postgres' to a secure password)
ALTER USER postgres PASSWORD 'postgres';

# Exit the PostgreSQL prompt
\q

Create a database for LiteLLM:

sudo -u postgres psql
CREATE DATABASE litellm;

Create and configure config.yaml. vim config.yaml and paste the following content:

model_list:
  - model_name: ollama/deepseek-r1:7b
    litellm_params:
      model: ollama/deepseek-r1:7b
      api_base: <http://localhost:11434>  # Points to your local Ollama

general_settings:
  default_model: ollama/deepseek-r1:7b
  master_key: sk-your-master-key  # Your master key from .env
  # Enable authentication
  auth_strategy: master_key
  database_url: "postgresql://postgres:postgres@localhost:5432/litellm"

Start LiteLLM in the virtual environment:

litellm --config config.yaml

If you want to run LiteLLM in the background:

# Start LiteLLM in the background, redirect both stdout and stderr to litellm.log
nohup litellm --config config.yaml > litellm.log 2>&1 &

# Find the process ID (PID) of the running LiteLLM server
ps aux | grep litellm

# Stop the LiteLLM server by killing its process (replace PID with actual process ID)
kill -9 PID

Now, LiteLLM is running on port http://localhost:4000 and if your server has an IP address, you can access it on http://your-server-ip:4000:

Go to admin panel and create an API key:

Now, test the API (change the API key and server IP address):

curl http://your-server-ip:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-your-api-key" \
-d '{
"model": "ollama/deepseek-r1:7b",
"messages": [{"role": "user", "content": "tell me a joke with less than 10 words"}]
}'

You should get a response like this (showing deepseek-r1 thinking and its response):

{
  "id":"chatcmpl-997fcca1-89cf-42ce-9844-70a47e3b2886",
  "created":1740686328,"model":"ollama/deepseek-r1:7b",
  "object":"chat.completion",
  "system_fingerprint":null,
  "choices":[
    {
      "finish_reason":"stop",
      "index":0,
      "message":{
        "content":"<think>\nAlright, so the user asked for a joke with less than 10 words. I need to come up with something short and sweet.\n\nHmm, maybe a simple pun or wordplay would work well here. Short jokes are tricky because they have to be punchy and clear in just a few words.\n\nLet me think of some common phrases that can be twisted into a joke. \"Why don't skeletons fight?\" is a classic setup. How about adding \"they don’t know how to 'rot'\"? That keeps it under 10 words and adds a bit of humor with the unexpected twist.\n</think>\n\nWhy don’t skeletons fight?  \nThey don’t know how to \"rot.\"",
        "role":"assistant",
        "tool_calls":null,
        "function_call":null
        }
    }
  ],
  "usage":{
    "completion_tokens":142,
    "prompt_tokens":18,
    "total_tokens":160,
    "completion_tokens_details":null,
    "prompt_tokens_details":null
    }
}

To make the same API call using Python and LiteLLM, see an example here (Note that your api_base should be “http://your-server-ip:4000” if you have not setup a domain name as shown in the Ngrok section below).

Open-WebUI

Start open-webui with:

open-webui serve

or run it in the background:

# Start Open-WebUI in the background and redirect all output to open-webui.log
nohup open-webui serve &> open-webui.log &

# Find the process ID of the running Open-WebUI server
ps aux | grep open-webui

# Stop the Open-WebUI server (replace PID with the actual process ID)
kill -9 PID

Now open-webui is running on http://localhost:8080. If your server has an IP address, you can access it on http://your-server-ip:8080. Once you setup the initial admin account, you can chat with the model:

The following is my note for running local development of Open-WebUI on Ubuntu (8 GB Memory / 2 Intel vCPUs / 160 GB Disk / Ubuntu 24.10 x64):

git clone https://github.com/open-webui/open-webui.git

# install nodejs
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt-get install -y nodejs
npm cache clean --force

# install frontend and run
npm install
npm run dev

# run frontend in background
nohup npm run dev > output.log 2>&1 &
ps aux | grep "npm run dev"
kill <PID>

# install conda
curl -fsSL https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -o miniconda.sh
bash miniconda.sh
source ~/.bashrc
conda --version

# install backend
cd backend
conda create --name open-webui python=3.11
conda activate open-webui
pip install -r requirements.txt -U

# run backend
sh dev.sh

# run backend in background
nohup sh dev.sh > output.log 2>&1 &
ps aux | grep "sh dev.sh"
kill <PID>

frontend: http://localhost:5173 backend: http://localhost:8080/docs

Ngrok

If your server is behind a firewall or doesn’t have a public IP address, you can use Ngrok to create secure tunnels to your local services.

# Add Ngrok's package signing key and repository
curl -s https://ngrok-agent.s3.amazonaws.com/ngrok.asc | sudo tee /etc/apt/trusted.gpg.d/ngrok.asc >/dev/null && \
echo "deb https://ngrok-agent.s3.amazonaws.com buster main" | sudo tee /etc/apt/sources.list.d/ngrok.list && \
sudo apt update && sudo apt install ngrok

# Verify Ngrok installation
ngrok --version

# Add your Ngrok authentication token (get this from your Ngrok dashboard)
ngrok config add-authtoken YOUR_NGROK_AUTH_TOKEN

# Create a tunnel to expose LiteLLM API (port 4000) to the internet
# This will provide a public URL that forwards to your local service
# 4000 for LiteLLM, 8080 for Open-WebUI
ngrok http 4000
ngrok http 8080

If you are a paid user of Ngrok, you can configure fixed domain names for your servers (add a sub domain and configure the CNAME record).

Then, you can use the following command to create a tunnel:

nohup ngrok http --url=litellm.yourdomain.com 4000 --log=ngrok_litellm.log --log-format=logfmt --log-level=info &
nohup ngrok http --url=openwebui.yourdomain.com 8080 --log=ngrok_openwebui.log --log-format=logfmt --log-level=info &

Now you can access your server from your domain name, such as litellm.yourdomain.com or openwebui.yourdomain.com.

Cline

Now you can use DeepSeek in Cline. Note you have to include the trailing / in the base URL, such as http://your-server-ip:4000/ or https://xxx.ngrok-free.app/: