Setting Up a Local AI Stack on Linux: Ollama + Open WebUI + Docker

Q: Do I need to install the full CUDA Toolkit to give Docker containers GPU access?

No. According to NVIDIA's official Container Toolkit documentation, the toolkit only requires the NVIDIA driver to be installed on the host -- it passes the host driver through to the container at runtime. The container image provides its own CUDA runtime. You do not need a system-wide CUDA Toolkit installation. Verify the driver is loaded with nvidia-smi, install the NVIDIA Container Toolkit, run nvidia-ctk runtime configure --runtime=docker, and restart Docker. After that, any container run with --gpus all has full GPU access.

Q: How do I update Open WebUI without losing conversation history?

Open WebUI stores all data including conversations, settings, and uploaded files in a Docker named volume at /app/backend/data inside the container. As long as you preserve that volume, you can safely pull a new image and recreate the container. With Docker Compose, run docker compose pull followed by docker compose up -d -- the named volume persists automatically. Never use docker rm -v on the Open WebUI container, as the -v flag removes attached volumes along with the container. For sessions to survive the container recreation, you must have a persistent WEBUI_SECRET_KEY set in your Compose file; without it, Open WebUI generates a new random key on every restart and logs all users out.

Q: What is the default Ollama context window size and how do I change it?

According to the official Ollama FAQ, the default context window size is 4096 tokens. When no VRAM information is available, Ollama falls back to 4096 tokens; on systems with 24–48 GiB VRAM the dynamic default scales to 32k, and on systems with 48 GiB or more it reaches 256k. This can be overridden globally by setting OLLAMA_CONTEXT_LENGTH in the systemd service override (for example, OLLAMA_CONTEXT_LENGTH=16384), or per-request by passing num_ctx in the API options field. For document Q&A or longer conversations a value of 16384 to 32768 is practical; larger values consume proportionally more VRAM.

Q: Is Open WebUI safe to expose on a local network?

Open WebUI ships with authentication enabled by default. The Direct Connections feature -- which allows users to link external OpenAI-compatible model servers -- is disabled by default and must be manually enabled by an admin. A high-severity vulnerability (CVE-2025-64496, CVSS 8.0 per NVD) discovered by Cato Networks researcher Vitaly Simonovich was disclosed in November 2025, affecting versions 0.6.34 and older. When Direct Connections was enabled, a malicious model server could send a crafted SSE 'execute' event that triggered arbitrary JavaScript in the browser via new Function(), enabling JWT theft, account takeover, and -- if the account held workspace.tools permission -- remote code execution on the host. The vulnerability is patched in version 0.6.35 and later; the current stable release is v0.8.12. Always run the latest Open WebUI image. Limit workspace.tools permission to essential users and treat all external model endpoints as untrusted code.

Q: What is WEBUI_SECRET_KEY and why do I need it?

WEBUI_SECRET_KEY is the cryptographic secret Open WebUI uses to sign JWT session tokens. Without a fixed value set in your Compose file, Open WebUI generates a new random key every time the container starts or is recreated. When the key changes, all active sessions are immediately invalidated -- every user is logged out, any configured MCP Tools break with decryption errors, and OAuth sessions become invalid. Set it once using a value from openssl rand -hex 32, store it in your docker-compose.yml environment section, and treat it like a password.

Q: When should I rotate WEBUI_SECRET_KEY and what happens when I do?

Rotate WEBUI_SECRET_KEY when a security event makes its integrity uncertain: a user with Docker group access is revoked, a suspected host compromise, or the .env file was accidentally exposed in version control. Rotating invalidates every active JWT session simultaneously -- all users are logged out, all MCP Tool sessions break, and OAuth tokens backed by Open WebUI become invalid. In multi-user deployments, treat this as a planned maintenance event. The procedure: generate a new value with openssl rand -hex 32, update it in your .env file, and run docker compose up -d. The old key is dead immediately with no grace period.

Q: What is the current stable version of Open WebUI and what changed in v0.8.x?

The current stable release is v0.8.12, published March 27, 2026. Key operational changes in the v0.8.x series include: an admin analytics dashboard introduced in v0.8.0 showing model usage and token consumption; a configurable tool server HTTP timeout via AIOHTTP_CLIENT_TIMEOUT_TOOL_SERVER; a new /ready Kubernetes readiness probe endpoint; a fix for offline mode attempting to download embedding models; graceful per-component frontend initialization (v0.8.5); a fix for the /api/models 500 error caused by models with incomplete user_id metadata (v0.8.5); SBOM attestations shipped with each image starting v0.8.6 (verifiable via cosign); and UVICORN_WORKERS and ENABLE_DB_MIGRATIONS controls for multi-worker deployments. The full changelog is at github.com/open-webui/open-webui/blob/main/CHANGELOG.md.

Kandi Brian

Running ollama run llama3.1:8b in a terminal is fine for testing. It falls apart the moment you want to come back to a conversation from yesterday, ask the model about a document you uploaded, or let a teammate use the same machine without learning the Ollama CLI. Open WebUI solves all of that. It is the widely adopted open-source frontend for Ollama, with over 290 million container pulls and a current stable release of v0.8.12 (March 27, 2026), and it ships as a Docker container that installs in a single command.

This guide covers the complete stack: Docker Engine installed the right way for GPU access, the NVIDIA Container Toolkit for GPU passthrough, Ollama running natively as a systemd service, and Open WebUI deployed via Docker Compose. The result is a private local AI stack on Linux that starts automatically on boot, survives updates without data loss, and works identically to a ChatGPT-style interface -- except everything stays on your machine. For security-conscious users and anyone handling sensitive data, that distinction has practical weight: cloud AI endpoints are a data exfiltration surface, and the host running them is an attack target. Understanding what a compromised Linux host looks like from the inside is directly relevant to any self-hosted service -- the Linux trojans guide covers how persistent threats operate on the same OS layer this stack runs on.

Commands are shown for Ubuntu 24.04 LTS. The Docker and NVIDIA Container Toolkit steps apply unchanged to any Debian-based distro. Fedora and Arch substitutions are noted inline where package manager commands differ. If this is a fresh Ubuntu install, working through the post-install hardening checklist before running any network-accessible service is worth the 30 minutes.

Security Note

A high-severity vulnerability in Open WebUI (CVE-2025-64496, CVSS 8.0 per NVD) was discovered by Vitaly Simonovich, senior security researcher at Cato Networks, and publicly disclosed in November 2025. It was patched in version 0.6.35. The flaw affected the Direct Connections feature -- which is disabled by default and must be manually enabled by an admin. When enabled, a malicious external model server could execute arbitrary JavaScript in the user's browser via crafted Server-Sent Events, enabling account takeover and -- in some configurations -- remote code execution on the host. The current stable release is v0.8.12. Always run the latest Open WebUI image and treat any externally-hosted model endpoint as untrusted. See the FAQ section for details. MITRE ATT&CK: T1059.007 (Command and Scripting: JavaScript); MITRE ATLAS: AML.T0051 (Prompt Injection). For a parallel example of how a missing validation layer in a self-hosted web UI becomes a remote code execution path, see the analysis of CVE-2026-27944 in Nginx UI.

Stack Architecture

Before installing anything, it helps to understand how the pieces fit together -- particularly the networking relationship between Ollama on the host and Open WebUI in a container:

Component	Where it runs	Port	Role
Ollama	Host (systemd service)	11434	Model inference engine and API
Open WebUI	Docker container	Host: 3000 / Container: 8080	Browser UI, chat history, user accounts
NVIDIA Container Toolkit	Host (Docker plugin)	--	Passes host GPU into Docker containers

Runs onHost (systemd service)

Port11434

RoleModel inference engine and REST API

Runs onDocker container

PortHost: 3000 / Container: 8080

RoleBrowser UI, chat history, user accounts, RAG

Runs onHost (Docker runtime plugin)

RolePasses host GPU into Docker containers at runtime

The key point is that Ollama runs on the host rather than inside Docker. This gives it direct GPU access without the additional configuration layer that containerized Ollama requires. Open WebUI reaches Ollama via the Docker host bridge using the special hostname host.docker.internal, which resolves to the host machine's IP from inside a container. If you want to understand exactly why Docker containers cannot reach localhost on the host, the Docker networking guide covers the bridge and host network models in detail.

host.docker.internal is not an automatic feature on Linux the way it is on Docker Desktop for macOS and Windows. On Linux, you must explicitly add it using extra_hosts: - "host.docker.internal:host-gateway" in the Compose file. The host-gateway value is a special Docker Compose keyword that resolves to the host's bridge interface IP at container startup -- typically 172.17.0.1 or similar, depending on your Docker network configuration. Without this line, the hostname does not exist inside the container at all, which is why Open WebUI shows a blank model list even when Ollama is running perfectly on the host.

"An extensible, feature-rich, and user-friendly self-hosted AI platform designed to operate entirely offline."

-- Open WebUI GitHub README, github.com/open-webui/open-webui

Note

If you prefer to run both Ollama and Open WebUI inside containers, Open WebUI provides a bundled image tagged :ollama that packages both together. The tradeoff is a larger container and less flexibility in configuring Ollama separately. The approach in this guide -- native Ollama, containerized Open WebUI -- is more maintainable long-term and is what the Open WebUI documentation recommends for dedicated Linux machines.

Threat Model: Attack Surface of This Stack

A self-hosted AI stack is not a security island. Every component -- the Docker daemon, the Ollama API, the Open WebUI session layer, the GPU driver surface, and the host OS -- has a documented adversary technique mapped to it in MITRE ATT&CK and MITRE ATLAS. Understanding which techniques apply to which layer of this specific stack is what separates a working setup from a hardened one.

The two relevant frameworks are MITRE ATT&CK (Enterprise/Containers matrix) and MITRE ATLAS (Adversarial Threat Landscape for AI Systems). ATT&CK covers the infrastructure layer -- Docker, the host OS, credentials, persistence, and lateral movement. ATLAS, which reached v5.1.0 in November 2025 with 84 techniques across 16 tactics, covers AI-specific attack vectors such as prompt injection, RAG poisoning, and model exfiltration. Together they give you a complete picture of the attack surface you are deploying.

The applicable NIST guidance documents for this stack are NIST SP 800-190 (Application Container Security, the primary reference for Docker-based deployments), NIST SP 800-204 (Security Strategies for Microservices-based Application Systems, covering the API boundary between Ollama and Open WebUI), NIST SP 800-204D (Software Supply Chain Security in DevSecOps CI/CD Pipelines, relevant to image provenance and update hygiene), and NIST AI RMF (AI Risk Management Framework, which frames the governance posture for operating an AI inference system).

Use the interactive explorer below to map each attack technique to the layer of this stack it targets and the specific mitigation this guide addresses.

Threat Intelligence Reference

MITRE ATT&CK / ATLAS Technique Explorer

Click any layer to expand its attack techniques and mapped mitigations.

Framework References

The techniques above reference the following authoritative sources: MITRE ATT&CK for Containers (attack.mitre.org/matrices/enterprise/containers); MITRE ATLAS v5.1.0 (atlas.mitre.org); NIST SP 800-190 Application Container Security Guide (csrc.nist.gov/pubs/sp/800/190/final); NIST SP 800-204 Security Strategies for Microservices-based Application Systems (csrc.nist.gov/pubs/sp/800/204/final); NIST SP 800-204D Software Supply Chain Security in DevSecOps CI/CD Pipelines (csrc.nist.gov/News/2024/nist-publishes-sp-800204d).

Step 1: Install Docker Engine

Install Docker CE from the official Docker APT repository. Do not use the Snap version of Docker -- Snap packages are sandboxed in a way that blocks access to the host's /dev/nvidia* device files, which means the NVIDIA Container Toolkit will fail silently even after correct installation.

"Docker Engine does not come pre-installed on every Linux distribution."

-- Docker Engine documentation, docs.docker.com/engine/install/ubuntu

The Snap confinement issue is worth understanding fully. Snap packages run in an AppArmor-enforced sandbox that restricts access to host devices by default. The /dev/nvidia0, /dev/nvidiactl, and /dev/nvidia-uvm device files that the NVIDIA Container Toolkit needs to expose to containers are outside that sandbox boundary. Even with the classic Snap confinement mode, the interaction between the Snap-managed Docker socket at /var/snap/docker/ and the system-level /etc/docker/daemon.json written by nvidia-ctk creates a split-brain situation -- each expects to own Docker's runtime configuration. The APT-installed Docker at /usr/bin/docker and /etc/docker/daemon.json is a single, coherent system with no such split.

Caution

If you installed Docker via sudo snap install docker or through the Ubuntu App Center, remove it first: sudo snap remove --purge docker. Then install the APT version below. Mixing Snap and APT Docker installations causes persistent permission errors with GPU passthrough.

terminal

# Remove any old Docker packages
$ sudo apt-get remove -y docker docker-engine docker.io containerd runc 2>/dev/null

# Add Docker's official GPG key and repository
$ sudo apt-get update
$ sudo apt-get install -y ca-certificates curl gnupg
$ sudo install -m 0755 -d /etc/apt/keyrings
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
  sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
$ sudo chmod a+r /etc/apt/keyrings/docker.gpg

$ echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

$ sudo apt-get update
$ sudo apt-get install -y docker-ce docker-ce-cli containerd.io \
  docker-buildx-plugin docker-compose-plugin

# Add your user to the docker group (no sudo needed for docker commands)
$ sudo usermod -aG docker $USER
# Log out and back in for group membership to take effect

# Verify
$ docker run hello-world

Step 2: Install the NVIDIA Container Toolkit

The NVIDIA Container Toolkit is the bridge between Docker and your GPU. It does not install GPU drivers -- it passes the host driver through to containers at runtime. As stated in the NVIDIA Container Toolkit documentation, you do not need to install the CUDA Toolkit on the host system, but the NVIDIA driver must be present and functional. Your NVIDIA driver must already be installed and nvidia-smi must work before proceeding. If it does not, see the NVIDIA Linux drivers guide first.

terminal

# Add NVIDIA Container Toolkit repository
$ curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

$ curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

$ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit

# Configure Docker to use the NVIDIA runtime
$ sudo nvidia-ctk runtime configure --runtime=docker
$ sudo systemctl restart docker

# Verify GPU is accessible inside containers
$ docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu24.04 nvidia-smi
# Should print your GPU name and driver version from inside the container

Tip

The nvidia-ctk runtime configure command modifies /etc/docker/daemon.json to add the NVIDIA runtime. If that file already exists with custom content (a registry mirror, log driver settings, etc.), review it after running the command to confirm your existing settings were preserved. The current stable version of the NVIDIA Container Toolkit is 1.19.0 as of April 2026. Source: NVIDIA Container Toolkit docs.

"You do not need to install the CUDA Toolkit on the host system."

-- NVIDIA Container Toolkit documentation, github.com/NVIDIA/nvidia-container-toolkit

What this means in practice: the container image you pull -- whether it is nvidia/cuda:12.8.0-base-ubuntu24.04 or ghcr.io/open-webui/open-webui:cuda -- carries its own CUDA runtime libraries. The host only needs to expose the GPU driver. This is why you can run CUDA 12.8 workloads inside a container even if your host was installed without CUDA at all. The toolkit uses a kernel module interface (the /dev/nvidia* character devices) to bridge the driver to the container namespace.

Starting with NVIDIA Container Toolkit v1.18.0, the default mode switched from legacy injection to CDI (Container Device Interface) just-in-time spec generation. CDI is an OCI-standard mechanism for device access that works across container runtimes -- not just Docker. If you see CDI-related log output after upgrading the toolkit, that is expected and not an error. The legacy --gpus all flag still works and triggers the CDI path transparently in v1.19.0.

Step 3: Install Ollama and Configure Host Binding

Install Ollama using the official script, which sets up the binary, systemd service, and storage directory automatically:

terminal

$ curl -fsSL https://ollama.com/install.sh | sh

# Verify it is running
$ systemctl status ollama

# Pull a model to test with
$ ollama pull llama3.1:8b

By default, Ollama listens only on 127.0.0.1:11434, which means Docker containers cannot reach it. You need to change the bind address to 0.0.0.0 so it accepts connections from the container network. Do this via the systemd service override rather than editing the unit file directly, so the setting survives Ollama updates:

"Ollama binds 127.0.0.1 port 11434 by default. Change the bind address with OLLAMA_HOST."

-- Ollama FAQ, docs.ollama.com/faq

A systemd drop-in override at /etc/systemd/system/ollama.service.d/override.conf is the correct method here for a specific reason: Ollama's install script places the main unit file at /lib/systemd/system/ollama.service. Files in /lib are owned by the package and will be overwritten on the next ollama update. Files you create in /etc/systemd/system/ollama.service.d/ are yours and are merged with the base unit by systemd at runtime -- the package never touches them. This is true for any systemd-managed service, not just Ollama. It is also worth knowing that systemd units and drop-in directories are a documented attacker persistence mechanism (MITRE ATT&CK T1543.005: Create or Modify Container Service; NIST SP 800-190 Section 3.3 recommends monitoring container service unit changes) -- understanding the attack surface you are configuring is worthwhile context. CVE-2026-3888 is a recent example of how a misconfigured systemd unit on Ubuntu became a root escalation path.

terminal

$ sudo mkdir -p /etc/systemd/system/ollama.service.d/
$ sudo tee /etc/systemd/system/ollama.service.d/override.conf <<EOF
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
EOF

$ sudo systemctl daemon-reload
$ sudo systemctl restart ollama

# Confirm it is now listening on all interfaces
$ ss -tlnp | grep 11434
# Should show 0.0.0.0:11434, not 127.0.0.1:11434

Warning: Bind Address vs. Firewall Rules

Binding Ollama to 0.0.0.0 makes it reachable from your entire local network, not just Docker containers on the same machine. Per NIST SP 800-204 Section 3.4, inter-service communication should be scoped to the minimum necessary network boundary -- here, only the Docker bridge subnet.

There are two approaches, and they differ architecturally in an important way. The first is to keep OLLAMA_HOST=0.0.0.0:11434 and restrict port 11434 at the firewall layer using nftables or iptables. The second -- and more defensible -- approach is to scope Ollama's bind address directly to the Docker bridge interface IP rather than relying on a separate enforcement layer. On a default Docker install, the bridge IP is typically 172.17.0.1. Set it in the override:

[Service]
Environment="OLLAMA_HOST=172.17.0.1:11434"

The reason this matters: firewall rules are stateful and mutable. An attacker or a misconfigured script that flushes the nftables ruleset leaves port 11434 open to the entire network. A bind address is baked into the service at startup -- nothing can widen it without restarting the service with a different override. Run docker network inspect bridge | grep '"Gateway"' to confirm the actual bridge IP on your system before substituting it; it can differ from the default on hosts with custom Docker network configurations. If you are running nftables (the default on Ubuntu 22.04+), the nftables guide covers the ruleset syntax you need as a defense-in-depth layer on top of the bind address. If you want to understand exactly how Docker interacts with your host firewall -- including the nat table PREROUTING rewrite that can bypass INPUT rules -- see Docker's iptables compatibility layer. For a broader architectural view see Docker daemon networking architecture.

Step 4: Deploy Open WebUI with Docker Compose

Create a project directory and the Compose file. The named volume is what persists your conversations, settings, uploaded files, and user accounts across container restarts and image updates -- never skip it.

A Docker named volume stores data at /var/lib/docker/volumes/ai-stack_open-webui-data/_data/ on the host (the prefix ai-stack_ comes from your project directory name). You can inspect its contents directly with sudo ls /var/lib/docker/volumes/ai-stack_open-webui-data/_data/ even while the container is running. If you ever need to back up your Open WebUI data, that is the directory to copy. A bind mount like ./data:/app/backend/data achieves the same persistence but puts you in charge of the path and permissions -- named volumes are simpler for most single-host setups.

terminal

$ mkdir -p ~/ai-stack && cd ~/ai-stack

~/ai-stack/docker-compose.yml

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    ports:
      - "3000:8080"
    volumes:
      - open-webui-data:/app/backend/data
    environment:
      - OLLAMA_BASE_URL=http://host.docker.internal:11434
      - WEBUI_SECRET_KEY=your-secret-key-here  # Required -- see note below
    extra_hosts:
      - "host.docker.internal:host-gateway"
    restart: unless-stopped

volumes:
  open-webui-data:

WEBUI_SECRET_KEY is Required

Replace your-secret-key-here with a real secret before starting the container. Without a persistent value, Open WebUI generates a random key on every container restart. When that happens, all active sessions are invalidated (you are logged out), any configured MCP Tools break with decryption errors, and OAuth sessions become invalid. Generate a key once and keep it: openssl rand -hex 32. Paste the output into the Compose file and never change it unless you intentionally want to invalidate all sessions.

terminal

# Generate a persistent secret key and insert it into the Compose file
$ openssl rand -hex 32
# Copy the output and replace "your-secret-key-here" in docker-compose.yml

# Pull the image and start the container
$ docker compose up -d

# Watch startup logs -- first run takes 30-60 seconds for DB init
$ docker compose logs -f open-webui
# Wait until you see: "Application startup complete"

Image Tag Note

The Compose file above uses ghcr.io/open-webui/open-webui:main, which always pulls the latest development build. For production or shared systems, pin to a specific release tag such as :v0.8.12 (the current stable release as of April 2026, per the Open WebUI releases page). Pinned versions let you control exactly when you take updates and make it easier to roll back if a release introduces a regression. The :latest tag also tracks stable releases and is a reasonable middle ground.

Once startup completes, open http://localhost:3000 in a browser. The first visitor creates the admin account -- enter an email and password to complete setup. After that, any further accounts require admin approval unless you disable authentication (covered in the configuration section below).

Step 5: Verify the Connection to Ollama

After logging in, Open WebUI should immediately show your available Ollama models in the model selector at the top of the chat area. If the selector is empty or shows a connection error, check the following:

terminal

# Confirm Ollama is reachable from inside the container
$ docker exec open-webui curl -s http://host.docker.internal:11434/api/tags | head -c 200
# Should return JSON listing your pulled models
# If this fails: Ollama is not binding on 0.0.0.0 -- recheck the override.conf

# Check Open WebUI logs for connection errors
$ docker compose logs open-webui | grep -i "ollama\|error\|failed"

# Verify Ollama is using the GPU when a model is loaded
$ ollama ps
# Processor column should show GPU, not CPU

Useful Configuration Options

Disable authentication for a single-user setup

If this is a personal workstation and you do not need user accounts, you can disable the login requirement entirely. Add the environment variable to your Compose file and restart. This setting removes all authentication and all authorization -- it is only appropriate for a machine you control exclusively and that is not reachable from a shared network:

~/ai-stack/docker-compose.yml (environment section)

    environment:
      - OLLAMA_BASE_URL=http://host.docker.internal:11434
      - WEBUI_SECRET_KEY=your-secret-key-here
      - WEBUI_AUTH=False  # Removes all auth -- single-user local workstation only

Caution

WEBUI_AUTH=False disables all authentication and authorization in Open WebUI. Anyone who can reach port 3000 -- on your machine or your local network -- has unrestricted access to every model, every conversation, all uploaded files, and all admin functions. Never use this setting on a machine accessible to others or exposed beyond your own workstation. The Open WebUI documentation classifies this configuration as a production risk. Disabling all authentication maps to MITRE ATT&CK T1078 (Valid Accounts -- Initial Access): an unauthenticated port is an implicit valid session for any reachable client. NIST SP 800-190 Section 3.3 requires that container runtime security never rely solely on network isolation as an authentication substitute. The underlying principle -- that removing authentication collapses all privilege boundaries simultaneously -- is the same reason running any network service as root is discouraged. For a framework on designing access boundaries for self-hosted Linux services, see zero-trust security on Linux.

Extend the Ollama context window

Ollama's default context window is 4096 tokens when no VRAM tier information is available; on systems with 24–48 GiB VRAM the dynamic default scales to 32k, and on systems with 48 GiB or more it reaches 256k. Regardless of hardware tier, any model will only use the allocated context window -- a 7B model trained on a 128k context window will still only use the tier-default unless you explicitly override it. The consequence is that long documents, RAG pipelines, and multi-turn conversations silently truncate beyond the context limit with no error message. The OLLAMA_CONTEXT_LENGTH environment variable sets a global override for all models. For per-model overrides, create a Modelfile with PARAMETER num_ctx and use ollama create to register it as a named variant. Note that KV cache scales linearly with context length -- a 7B model at 32k context consumes roughly 6 GB of VRAM versus approximately 400 MB at 4k. Sources: official Ollama FAQ; Ollama context length docs.

/etc/systemd/system/ollama.service.d/override.conf

[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_CONTEXT_LENGTH=16384"

Pull and manage models from within Open WebUI

You do not have to use the CLI to pull models. In Open WebUI, go to Admin Panel > Settings > Models. Enter a model name from the Ollama library (such as qwen2.5:7b or mistral:7b-instruct) and click the download button. Open WebUI triggers the pull via the Ollama API and shows download progress in the UI.

Add OpenAI or other API endpoints alongside local models

Open WebUI supports any OpenAI-compatible API alongside Ollama. Go to Settings > Connections and add an endpoint URL and API key. Once added, the external provider's models appear in the same dropdown as your local Ollama models, and you can switch between them per conversation.

Access the stack remotely without exposing port 3000

If you want to use Open WebUI from another machine on your network -- or from outside your home network entirely -- do not bind port 3000 to a public interface. The better option is a VPN that terminates on your Linux host, so only authenticated tunnel traffic reaches the container network. WireGuard on Linux is the minimal, kernel-native approach: a peer with a valid key can reach your stack; everything else sees a closed port. If the host is also accessible via SSH, harden the SSH configuration before putting it on any network you do not fully control -- the SSH audit and hardening guide covers key-only auth, algorithm restrictions, and rate limiting.

Security Hardening Checklist

The following checklist maps each hardening action to the specific MITRE ATT&CK or ATLAS technique it mitigates and the relevant NIST SP guidance. Check items off as you complete them -- progress is saved for your current browser session.

Stack Hardening

Security Checklist

Progress: 0 / 0

What Changed Between v0.6.x and v0.8.x

If you set this stack up based on older guides, your running instance has almost certainly drifted from current defaults and capabilities. The jump from the v0.6.x series (where CVE-2025-64496 was patched) to the current v0.8.x series brought functional changes that affect how you configure and operate the stack today. Here is what matters operationally.

v0.8.0 introduced a destructive database migration

This is the single most important operational fact about upgrading to the v0.8.x series. v0.8.0 (released February 12, 2026) restructured the chat storage layer into a new chat_message table. On instances with large chat histories, this migration can run for several minutes and consume significant RAM -- the migration briefly duplicates data in memory, and at least one production deployment with 10 GB of chat history hit an OOM kill during the process. The Open WebUI team explicitly flagged this in the v0.8.0 release notes: the migration must not be interrupted, and multi-worker or load-balanced deployments cannot use rolling updates -- all instances must be updated simultaneously, as running mixed schema versions causes application failures. Before upgrading from any v0.6.x or v0.7.x instance to v0.8.x, take a full backup first. The backup command in the Updates section is not optional here.

To estimate whether you have sufficient memory headroom before upgrading: check your current webui.db file size with docker exec open-webui du -sh /app/backend/data/webui.db. The migration briefly holds a working copy of the chat data in memory alongside the source, so budget roughly 2x the webui.db size in free RAM, plus the container's baseline footprint (typically 400–600 MB). A 2 GB webui.db needs approximately 4.5 GB of free RAM for the migration to complete safely. Check available memory on the host with free -h before pulling the v0.8.x image. If you do not have the headroom, spin up the migration on a separate host with more RAM using a copy of the volume before deploying on the production machine.

v0.8.0 also introduced the analytics dashboard for administrators, showing model usage statistics, token consumption by model and user, user activity, and time-series charts. This is relevant operationally: the dashboard gives you a baseline of normal usage patterns, which makes anomalous inference activity (AML.T0034: Cost Harvesting; AML.T0040: ML Model Inference API Access) visible without external monitoring tooling. If Ollama port 11434 is ever reached by an unauthorized client, the usage patterns in the analytics dashboard will show the discrepancy before ollama ps catches it.

For multi-worker Docker Compose deployments (those using multiple open-webui container replicas), the ENABLE_DB_MIGRATIONS=false and UVICORN_WORKERS controls let you run the database migration on a single instance first. Single-container deployments -- which is the architecture this guide builds -- are unaffected, but if you ever scale horizontally, set UVICORN_WORKERS=1 on all but one instance before updating to prevent concurrent migration attempts.

Rolling Back After v0.8.x Is Not Safe

Pinning an older image tag and restarting does not undo a database schema migration. If you update to v0.8.x and it runs its migration successfully, rolling back the container to a v0.7.x image will likely fail because the older code does not understand the new schema. The only safe rollback path is restoring from a backup taken before the migration ran. This is why backing up before every major update is not optional -- it is your only recovery option.

Configurable tool server timeout

Open WebUI v0.8.x adds a configurable HTTP timeout for tool server requests via the AIOHTTP_CLIENT_TIMEOUT_TOOL_SERVER environment variable. The default may be too short for heavy tool calls. Add it to your Compose environment block if you run external tool servers alongside Open WebUI:

~/ai-stack/docker-compose.yml (environment section)

    environment:
      - OLLAMA_BASE_URL=http://host.docker.internal:11434
      - AIOHTTP_CLIENT_TIMEOUT_TOOL_SERVER=60  # seconds; default may be too short for heavy tools

Kubernetes readiness probe endpoint

A /ready endpoint was added in v0.8.x, returning HTTP 200 only after startup completes and the database is reachable. This has no effect on a Docker Compose deployment, but if you ever migrate to Kubernetes, use /ready as your readiness probe path -- not the root path -- to prevent traffic routing before the app fully initializes. Source: Open WebUI CHANGELOG.

Offline mode embedding fix

Earlier v0.6.x and v0.7.x builds attempted to download embedding models even when running with OFFLINE_MODE=True, generating noisy error logs. This is resolved in v0.8.x. Remove any workarounds you added for that behavior -- they are no longer needed.

Frontend initialization resilience

The app layout in v0.8.x no longer blocks the entire page load if any individual API call fails during startup (models, banners, tools, user settings, tool servers). In v0.6.x, a single failed startup call could leave a blank or broken UI that required a hard page reload. The v0.8.x behavior fails gracefully per-component, which is particularly noticeable on slow or unreliable network paths between Open WebUI and Ollama.

Admin model visibility fix

Administrators can now see all available models even before any access control configuration has been applied. A regression in earlier releases caused the /api/models endpoint to return a 500 error when models had incomplete metadata missing a user_id field (common with globally-defaulted models). This is resolved in v0.8.x. If you experienced unexplained model visibility gaps after an update, this was likely the cause.

v0.8.6 additions worth knowing for this stack

v0.8.6 (released March 1, 2026) added several features that change the operational picture for this stack. SBOM (Software Bill of Materials) attestations are now shipped with each release image, which enables the cosign verify-attestation workflow described in the threat model section. The supply chain mitigation for T1195.002 is now fully actionable without reaching for third-party tooling. v0.8.6 also added streaming performance improvements and security headers at the application layer -- if your reverse proxy or Compose setup adds custom security headers, review for conflicts with the application-level headers shipped in v0.8.6+.

Full Changelog

The complete version-by-version changelog is maintained at github.com/open-webui/open-webui/blob/main/CHANGELOG.md. Check it before every major update -- Open WebUI ships frequently and occasionally includes behavior changes that do not surface in release titles.

Updating Open WebUI Without Data Loss

Open WebUI releases updates frequently. The update process is a single command that pulls the new image and recreates the container, leaving the named volume -- and therefore all your data -- untouched. Back up your data volume before every update, then pull and restart:

terminal

# Back up the data volume before updating (recommended before every update)
$ docker run --rm -v ai-stack_open-webui-data:/source -v $(pwd):/backup \
  alpine tar czf /backup/openwebui-backup-$(date +%Y%m%d).tar.gz -C /source .

# Pull the latest image and recreate the container
$ cd ~/ai-stack && docker compose pull && docker compose up -d

# Verify the new version is running
$ docker compose ps
$ docker compose logs open-webui | tail -20

# Clean up old images to reclaim disk space
$ docker image prune -f

Secret Key Across Updates

If your Compose file has a persistent WEBUI_SECRET_KEY value (as it should), all sessions survive the container recreation automatically. If you omitted the key and Open WebUI is generating a random one on each start, adding it now will invalidate all current sessions once -- but after that, sessions will survive every future update. Generate one with openssl rand -hex 32, add it to the Compose file, and run docker compose up -d.

When to Rotate WEBUI_SECRET_KEY

The key is designed to stay the same across the life of the instance -- but there are specific circumstances that warrant replacing it. Rotate it when: a user with Docker group access on the host is removed or their access is revoked; you suspect the host has been compromised; the .env file was committed to version control and then removed; or any other event that could have exposed the key value.

Understand the blast radius before rotating. Changing WEBUI_SECRET_KEY and running docker compose up -d immediately invalidates every active JWT session across the entire instance -- all users are logged out simultaneously, all MCP Tool sessions break with decryption errors, and any OAuth sessions that are backed by Open WebUI's token store become invalid. In a single-user setup this is a minor inconvenience. In a multi-user deployment, treat it as a planned maintenance event and notify users in advance.

The rotation procedure is: generate the new key with openssl rand -hex 32, update the value in your .env file, and run docker compose up -d. The old key is immediately dead. There is no grace period.

Clear Your Browser Cache After Every Update

Open WebUI's frontend is a compiled SvelteKit app. After a container update, your browser may cache stale JavaScript or CSS assets from the previous version, causing layout breakage, blank panels, or failed API calls that look like backend errors. Always do a hard reload with Ctrl+F5 (or Cmd+Shift+R on macOS) after every update before concluding something is broken. This is documented in the official update guide and catches a significant fraction of post-update complaints. Source: Open WebUI docs.

Direct Database Backup Alternative

The volume archive backup above copies everything in the data volume. If you only want the database itself -- conversations, users, settings, but not cached embeddings or the vector store -- you can copy it directly: docker cp open-webui:/app/backend/data/webui.db ./webui.db. The webui.db file at /app/backend/data/webui.db inside the container is the SQLite database that stores all Open WebUI application state. This is the file to restore if you need to migrate to a different host or recover from a failed update. The surrounding cache/, uploads/, and vector_db/ directories in the same volume are regenerable -- webui.db is not.

Caution

Never run docker rm -v open-webui. The -v flag removes volumes attached to the container, which will delete your entire conversation history, uploaded files, settings, and user accounts. To remove the container while preserving data, use docker rm open-webui without the -v flag, or simply use docker compose down without the --volumes flag.

Keeping Open WebUI current also means keeping its host OS current. CVE-2025-64496 was patched in a point release; if you are running an unpatched version because you stopped checking for updates, the mitigation of having Direct Connections disabled by default is your only protection. Security fixes in actively developed self-hosted software arrive in minor releases, not just majors. Subscribe to the Open WebUI releases feed to catch them. At the host level, automated package updates combined with audit logging give you the patch coverage and detection baseline you need -- the unauthorized crontab modification guide explains one of the persistence techniques attackers layer onto compromised hosts when patching is delayed.

Troubleshooting

Logged out after every container restart or update

This is the most common symptom of a missing WEBUI_SECRET_KEY. Open WebUI signs JWT session tokens with this key. If no persistent key is set, a new random one is generated each time the container starts, invalidating all existing tokens immediately. Fix it by generating a key once with openssl rand -hex 32, adding WEBUI_SECRET_KEY=<that value> to the environment section of your Compose file, and running docker compose up -d. You will be logged out once more as the new key takes effect, but sessions will survive every future restart and update from that point on.

Open WebUI shows no models / cannot connect to Ollama

Run docker exec open-webui curl -s http://host.docker.internal:11434/api/tags. If this returns an empty response or connection refused, Ollama is not reachable from inside the container. Confirm that OLLAMA_HOST=0.0.0.0:11434 is set in the systemd override and that you ran sudo systemctl restart ollama after the change. Also confirm extra_hosts: - "host.docker.internal:host-gateway" is present in the Compose file -- without it, the hostname does not resolve.

GPU not being used during inference

Open WebUI does not perform inference directly -- it sends requests to Ollama on the host, which handles the GPU. Check ollama ps while a model is responding to see the processor column. If it shows CPU, the issue is with Ollama's GPU access, not with Docker or Open WebUI. Review the GPU setup steps in the Ollama GPU setup guide.

Open WebUI starts but the page does not load

The container needs 30 to 60 seconds to complete database initialization and migration on first start. Check docker compose logs -f open-webui and wait for the "Application startup complete" line. If Python errors appear repeatedly, the container may be running out of memory -- check docker stats open-webui and ensure the host has enough free RAM.

docker run with --gpus all fails after correct toolkit installation

The most common cause after a correct install is that Docker was installed via Snap before the APT version. Even after removing the Snap version and installing APT Docker, residual configuration can interfere. Verify with which docker (should show /usr/bin/docker, not a Snap path) and check cat /etc/docker/daemon.json to confirm the NVIDIA runtime is registered. If the runtime section is missing, run sudo nvidia-ctk runtime configure --runtime=docker again and restart Docker.

Container crashes during upgrade to v0.8.x (OOM or migration failure)

The v0.8.0 database migration restructures the internal chat storage and briefly duplicates data in memory. On instances with large chat histories, this can exceed available RAM and be killed by the kernel before the migration completes. The symptom is the container exiting during its first startup after an update, with either an OOM message in docker compose logs open-webui or a startup crash with a schema-related Python traceback. If this happens, do not attempt to restart the container repeatedly -- a partially completed schema migration can leave the database in an inconsistent state where each restart fails at a different migration step. Restore from a backup taken before the update, then either add a memory limit override to your Compose file or run the migration on a host with more available RAM. If you do not have a pre-update backup and the container is stuck, consult the manual database migration guide in the Open WebUI documentation before touching the database file.

How to Set Up a Local AI Stack on Linux with Ollama, Open WebUI, and Docker

Step 1: Install Docker Engine and the NVIDIA Container Toolkit

Install Docker CE from the official Docker APT repository -- not the Snap package, which blocks GPU access. After installation, add your user to the docker group. Then install the NVIDIA Container Toolkit, run sudo nvidia-ctk runtime configure --runtime=docker, and restart Docker. Verify end-to-end GPU access in containers with: docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu24.04 nvidia-smi

Step 2: Install Ollama and configure it to accept container connections

Install Ollama with curl -fsSL https://ollama.com/install.sh | sh. Then add OLLAMA_HOST=0.0.0.0:11434 to the systemd service override at /etc/systemd/system/ollama.service.d/override.conf. Reload and restart the service. Confirm it is listening on all interfaces with ss -tlnp | grep 11434, then pull at least one model with ollama pull llama3.1:8b.

Step 3: Deploy Open WebUI with Docker Compose

Create a docker-compose.yml defining the Open WebUI service with the named data volume, extra_hosts: host.docker.internal:host-gateway, OLLAMA_BASE_URL=http://host.docker.internal:11434, and a persistent WEBUI_SECRET_KEY (generate one with openssl rand -hex 32). Run docker compose up -d and wait for the startup complete message in the logs. Open http://localhost:3000 to create the admin account.

Step 4: Verify, pull models, and configure

Confirm Ollama is reachable from inside the container with docker exec open-webui curl -s http://host.docker.internal:11434/api/tags. In Open WebUI, go to Admin Panel to pull additional models and configure system prompts. Verify GPU usage during inference with ollama ps. To update Open WebUI in future, run docker compose pull && docker compose up -d from the project directory -- the named volume preserves all data.

Knowledge Check

Test your understanding of the concepts in this guide. Each question targets a specific decision point that real deployments get wrong.

Interactive

Local AI Stack: Knowledge Check

Score: 0 / 0

Frequently Asked Questions

Why does Open WebUI fail to connect to Ollama even though both are running?

The most common cause is a networking mismatch. When Ollama runs on the host and Open WebUI runs in a Docker container, the container cannot reach localhost or 127.0.0.1 -- those addresses resolve to the container itself, not the host. The fix is to use extra_hosts: - "host.docker.internal:host-gateway" in Docker Compose and set OLLAMA_BASE_URL to http://host.docker.internal:11434. Also confirm Ollama is listening on 0.0.0.0 rather than 127.0.0.1 by checking the OLLAMA_HOST environment variable in its systemd override.

Do I need to install the full CUDA Toolkit to give Docker containers GPU access?

No. The NVIDIA Container Toolkit only requires the NVIDIA driver to be installed on the host -- it passes the host driver through to the container at runtime. The container image provides its own CUDA runtime. Verify the driver is loaded with nvidia-smi, install the NVIDIA Container Toolkit, run nvidia-ctk runtime configure --runtime=docker, and restart Docker. After that, any container run with --gpus all has full GPU access without a system-wide CUDA Toolkit.

How do I update Open WebUI without losing conversation history?

Open WebUI stores all data in a Docker named volume at /app/backend/data inside the container. Before updating, back up the volume: docker run --rm -v ai-stack_open-webui-data:/source -v $(pwd):/backup alpine tar czf /backup/openwebui-backup-$(date +%Y%m%d).tar.gz -C /source . Then run docker compose pull followed by docker compose up -d from the project directory -- the named volume persists automatically. For active sessions to survive the container recreation, ensure WEBUI_SECRET_KEY is set to a fixed value in your Compose file. Never use docker rm -v on the Open WebUI container, as the -v flag removes attached volumes along with the container.

Can Open WebUI connect to cloud AI APIs alongside local Ollama models?

Yes. Open WebUI supports any OpenAI-compatible API endpoint alongside Ollama. Go to Settings, then Connections, and add the endpoint URL and API key for the external provider. Once configured, all available models -- both local Ollama models and cloud API models -- appear in the same model selector dropdown. You can switch between a local Llama model and a cloud-hosted model in the same chat interface without changing any other settings.

What is the default Ollama context window size and how do I change it?

According to the official Ollama FAQ and the context length documentation, the default context window is 4096 tokens when running on a system with less than 24 GiB VRAM; on systems with 24–48 GiB VRAM the dynamic default scales to 32k, and on systems with 48 GiB or more it reaches 256k. Regardless of the hardware tier, any model is bound to the allocated window -- a model trained on 128k context will still only use the tier-default unless you override it. Critically, Ollama silently truncates input that exceeds this limit with no error message; the model simply never receives the truncated tokens, which is why long document Q&A and multi-turn conversations can produce degraded results without any obvious failure signal. You can override it globally via the OLLAMA_CONTEXT_LENGTH environment variable in the systemd service override, or per-request by passing num_ctx in the API options field. For document Q&A or longer conversations, 16384 to 32768 is a practical range. Keep in mind that the KV cache scales linearly with context length -- a 7B model at 32k context consumes roughly 6 GB of VRAM, compared to approximately 400 MB at 4k.

Is Open WebUI safe to expose on a local network?

Open WebUI ships with authentication enabled by default, so unauthenticated users reaching port 3000 see a login prompt. A high-severity vulnerability (CVE-2025-64496, CVSS 8.0 per NVD) was discovered by Vitaly Simonovich, senior security researcher at Cato Networks, and publicly disclosed in November 2025. The flaw was present in versions 0.6.34 and older. Critically, the affected feature -- Direct Connections -- is disabled by default; exploitation requires an admin to first enable it and a user to manually add a malicious model URL. When enabled, a hostile external server could send a crafted Server-Sent Events message that triggered arbitrary JavaScript execution in the victim's browser via new Function() -- enabling JWT theft, account takeover, and in some configurations remote code execution on the host server if the compromised account held workspace.tools permission. The vulnerability is patched in version 0.6.35 and later; the current stable release is v0.8.12. Always run the latest Open WebUI image. If you do use Direct Connections, treat every external endpoint as untrusted third-party code, limit workspace.tools to essential users only, and monitor for unexpected tool creation activity. If you want to inspect traffic between Open WebUI and Ollama on the host, Wireshark with a remote tcpdump capture over SSH lets you watch the API calls in real time without installing anything on the container. For a broader framework on hardening Linux-based services, see the guide on zero-trust security on Linux. Sources: NVD CVE-2025-64496; GHSA-cm35-v4vp-5xvx; Cato CTRL research disclosure.

What is WEBUI_SECRET_KEY and why do I need it?

Open WebUI uses WEBUI_SECRET_KEY to sign JWT session tokens. If you do not set a fixed value, the application generates a random key each time the container starts. That random key changes on every restart and every update, immediately invalidating all active sessions -- logging every user out. Set it once in your Compose file's environment section using a value generated by openssl rand -hex 32, and never change it unless you intentionally want to force all users to log in again. The key should be treated like a password: do not commit it to a public repository, and do not share it.

When and how do I rotate WEBUI_SECRET_KEY?

The key is designed to be stable for the life of the instance. Rotate it when a security event makes the current key's integrity uncertain: a user with Docker group access on the host has their access revoked, a suspected host compromise, the .env file was exposed in a git commit and then removed, or any scenario where the key value may have left your control. Rotating it on a schedule without cause is counterproductive -- it just logs users out unnecessarily.

Know the blast radius before rotating. A key change invalidates every active JWT session simultaneously: all users are logged out at the moment the new container starts, all MCP Tool sessions break with decryption errors, and any OAuth tokens backed by Open WebUI's token store become invalid. In a multi-user deployment, treat this as a planned maintenance window and notify users before running it. The rotation procedure is: generate the new value with openssl rand -hex 32, update it in your .env file, then run docker compose up -d. The old key is dead immediately -- there is no grace period or session overlap.

Sources and References

All technical claims in this guide are verifiable against the following primary sources:

Open WebUI Official Documentation -- deployment options, environment variables, volume paths, feature reference
Ollama FAQ -- default context window (4096 tokens baseline; VRAM-tiered dynamic default), OLLAMA_CONTEXT_LENGTH, OLLAMA_HOST, server configuration
Ollama Context Length Documentation -- VRAM-tiered dynamic defaults (4k / 32k / 256k), setting context via CLI and environment variable, verifying allocated context with ollama ps
NVIDIA Container Toolkit Installation Guide (v1.19.0) -- Debian/Ubuntu installation steps, nvidia-ctk runtime configure, GPU passthrough verification
Docker Engine Installation -- Ubuntu -- official APT repository setup, post-install steps
Open WebUI GitHub Repository -- current stable release v0.8.12 (March 27, 2026); 290 million+ container pulls as of April 2026
NVD: CVE-2025-64496 -- Open WebUI Direct Connections code injection via SSE events, CVSS 8.0 (NVD), patched in v0.6.35
GitHub Security Advisory GHSA-cm35-v4vp-5xvx -- full technical disclosure, patch commit reference
Cato CTRL: CVE-2025-64496 Research Disclosure -- original discovery by Vitaly Simonovich; attack mechanics, exploitation proof-of-concept, and remediation guidance
NVIDIA Container Toolkit Release Notes -- v1.19.0 feature summary and changelog
Sigstore cosign -- used to verify Open WebUI image signatures on ghcr.io; separate from Docker Content Trust / Notary v1
Open WebUI Desktop App -- native desktop wrapper that runs Open WebUI without Docker; work in progress, not recommended for production use as of April 2026
MITRE ATT&CK for Containers Matrix -- T1610 (Deploy Container), T1611 (Escape to Host), T1609 (Container Administration Command), T1613 (Container and Resource Discovery), T1552.007 (Unsecured Credentials: Container API), T1543.005 (Create or Modify Container Service), T1195.002 (Supply Chain Compromise), T1078 (Valid Accounts), T1059.007 (Command and Scripting: JavaScript)
MITRE ATLAS v5.1.0 -- AML.T0051 (Prompt Injection), AML.T0040 (ML Model Inference API Access), AML.T0057 (RAG Credential Harvesting), AML.T0034 (Cost Harvesting); 16 tactics, 84 techniques as of November 2025
NIST SP 800-190: Application Container Security Guide -- runtime privilege constraints (Sec. 3.3), image trust (Sec. 3.5), secrets management (Sec. 4.4)
NIST SP 800-204: Security Strategies for Microservices-based Application Systems -- API authentication and access management (Sec. 3.1), secure service communication (Sec. 3.4)
NIST SP 800-204D: Strategies for the Integration of Software Supply Chain Security in DevSecOps CI/CD Pipelines (2024) -- image provenance, build integrity, secrets in CI/CD (Sec. 3.2, 4.1)

^ back to top

Setting Up a Local AI Stack on Linux: Ollama + Open WebUI + Docker

Stack Architecture

Threat Model: Attack Surface of This Stack

1Step 1: Install Docker Engine

2Step 2: Install the NVIDIA Container Toolkit

3Step 3: Install Ollama and Configure Host Binding

4Step 4: Deploy Open WebUI with Docker Compose

5Step 5: Verify the Connection to Ollama

Useful Configuration Options

Disable authentication for a single-user setup

Extend the Ollama context window

Pull and manage models from within Open WebUI

Add OpenAI or other API endpoints alongside local models

Access the stack remotely without exposing port 3000

Security Hardening Checklist

What Changed Between v0.6.x and v0.8.x

v0.8.0 introduced a destructive database migration

Configurable tool server timeout

Kubernetes readiness probe endpoint

Offline mode embedding fix

Frontend initialization resilience

Admin model visibility fix

v0.8.6 additions worth knowing for this stack

Updating Open WebUI Without Data Loss

Troubleshooting

Logged out after every container restart or update

Open WebUI shows no models / cannot connect to Ollama

GPU not being used during inference

Open WebUI starts but the page does not load

docker run with --gpus all fails after correct toolkit installation

Container crashes during upgrade to v0.8.x (OOM or migration failure)

How to Set Up a Local AI Stack on Linux with Ollama, Open WebUI, and Docker

Step 1: Install Docker Engine and the NVIDIA Container Toolkit

Step 2: Install Ollama and configure it to accept container connections

Step 3: Deploy Open WebUI with Docker Compose

Step 4: Verify, pull models, and configure

Knowledge Check

Frequently Asked Questions

Why does Open WebUI fail to connect to Ollama even though both are running?

Do I need to install the full CUDA Toolkit to give Docker containers GPU access?

How do I update Open WebUI without losing conversation history?

Can Open WebUI connect to cloud AI APIs alongside local Ollama models?

What is the default Ollama context window size and how do I change it?

Is Open WebUI safe to expose on a local network?

What is WEBUI_SECRET_KEY and why do I need it?

When and how do I rotate WEBUI_SECRET_KEY?

Sources and References

Step 1: Install Docker Engine

Step 2: Install the NVIDIA Container Toolkit

Step 3: Install Ollama and Configure Host Binding

Step 4: Deploy Open WebUI with Docker Compose

Step 5: Verify the Connection to Ollama