Running AI Workloads on Linux: A Beginner's Setup Guide

Artificial intelligence and machine learning workloads have become a standard part of the computing landscape. Whether you're training a neural network, running inference on a large language model, or experimenting with computer vision, the underlying infrastructure almost always runs on Linux. There's a reason for that: Linux gives you direct access to GPU hardware, granular control over system resources, and compatibility with virtually every AI framework in existence. This guide walks you through setting up a Linux machine for AI workloads from scratch, assuming you're comfortable with a terminal but new to the ML toolchain.

Why Linux for AI?

Before diving into commands, it's worth understanding why Linux dominates the AI infrastructure space. NVIDIA's CUDA toolkit -- the foundation that powers GPU-accelerated computing for deep learning -- was built with Linux as a first-class citizen. Driver support is tighter, performance overhead is lower, and the entire ecosystem of frameworks like PyTorch, TensorFlow, and JAX assumes a Linux environment as the default deployment target.

Beyond driver support, Linux gives you something that other operating systems struggle to match: reproducibility. With package managers, containerization tools like Docker, and fine-grained control over kernel parameters, you can build an AI environment once and replicate it identically across dozens of machines. That matters when you're scaling from a single workstation to a multi-node training cluster.

You also get access to the full power of the system without an intermediary. There's no hypervisor sitting between your code and the GPU, no background processes competing for VRAM, and no forced reboots interrupting a 12-hour training run. For workloads that push hardware to its limits, that direct access is essential.

Note

While this guide focuses on NVIDIA GPUs because they hold the overwhelming market share for AI workloads, AMD's ROCm platform has matured significantly and now supports PyTorch, TensorFlow, and JAX. If you're running AMD hardware, the driver setup differs but the rest of the workflow is similar.

Choosing Your Distribution

Not all Linux distributions are equally suited for AI work. Your choice affects how easy it will be to install GPU drivers, how current the kernel is, and how well-supported your environment will be when something breaks at 2 AM during a training run.

Ubuntu 22.04 or 24.04 LTS is the safe default. NVIDIA tests their drivers against Ubuntu first, and nearly every AI framework publishes installation instructions that assume Ubuntu. If you're just getting started and want the path of least resistance, this is it. The LTS (Long Term Support) releases get five years of security patches, so your environment won't age out from under you.

Rocky Linux / AlmaLinux 9 are enterprise-grade options that descend from the CentOS lineage. If you're deploying into a corporate environment where RHEL compatibility matters, these are solid choices. Driver installation requires a few extra steps, but the stability tradeoffs are worth it for production deployments.

Arch Linux / Fedora give you bleeding-edge kernels and packages, which can be useful when you need support for the very latest GPU hardware. The tradeoff is that rolling updates occasionally break things, and you'll spend more time maintaining the system than running experiments on it.

Pro Tip

If you're setting up a dedicated AI workstation, install the server variant of your chosen distribution. Desktop environments consume GPU memory and CPU cycles that you'd rather dedicate to training. You can always SSH in from a more comfortable machine.

Installing GPU Drivers and CUDA

This is where many beginners hit their first wall. GPU driver installation on Linux has a reputation for being painful, and while it's gotten significantly better, there are still pitfalls to avoid. The key principle is simple: never install drivers from the NVIDIA .run file unless you have a very specific reason. Use your distribution's package manager or NVIDIA's official repository instead.

On Ubuntu, the cleanest approach is to use the ubuntu-drivers utility, which detects your GPU and recommends the appropriate driver version:

terminal

# Update package lists and install detection utility
$ sudo apt update && sudo apt install -y ubuntu-drivers-common

# See which drivers are recommended for your GPU
$ ubuntu-drivers devices

# Install the recommended driver automatically
$ sudo ubuntu-drivers autoinstall

# Reboot to load the new kernel module
$ sudo reboot

After rebooting, verify the driver loaded correctly with nvidia-smi. This command shows your GPU model, driver version, CUDA version, current temperature, and memory usage. If this command works, your driver is installed correctly.

Warning

If your system has Secure Boot enabled (common on modern UEFI systems), the driver installation will prompt you to enroll a Machine Owner Key (MOK). You must set a password during installation, then reboot and select "Enroll MOK" from the blue screen that appears. Skipping this step means the unsigned kernel module won't load, and nvidia-smi will fail. You can check your Secure Boot status with mokutil --sb-state.

terminal

$ nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        Off | 00000000:01:00.0   Off |                  Off |
|  0%   34C    P8              18W / 450W |       1MiB /  24564MiB |      0%      Default |
+-----------------------------------------+------------------------+----------------------+

Next comes the CUDA Toolkit. CUDA provides the compiler (nvcc), runtime libraries, and developer tools that AI frameworks use to offload computation to the GPU. Install it from NVIDIA's official repository rather than through apt directly, since the distribution-provided version is often outdated:

terminal

# Download and install the CUDA keyring package
$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
$ sudo dpkg -i cuda-keyring_1.1-1_all.deb
$ sudo apt update

# Install the CUDA toolkit (this does NOT reinstall the driver)
$ sudo apt install -y cuda-toolkit-12-6

# Add CUDA to your PATH
$ echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
$ echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
$ source ~/.bashrc

# Verify installation
$ nvcc --version

Warning

CUDA toolkit versions must be compatible with your driver version. The CUDA 12.6 toolkit ships with driver 560.28+, and while CUDA 12.x applications can run on drivers as old as 525+ through minor version compatibility, using the driver bundled with your toolkit version avoids subtle feature gaps. Check NVIDIA's compatibility matrix before installing. Mismatched versions are one of the most common causes of cryptic runtime errors.

Setting Up Your Python Environment

The AI ecosystem runs on Python, and managing Python environments correctly will save you from the single most frustrating class of errors in machine learning: dependency conflicts. The system Python that ships with your distribution is not the one you should use for AI work. Instead, set up isolated environments that you can create, destroy, and rebuild without affecting the rest of the system.

Miniconda is the recommended approach for beginners. It gives you a minimal Python distribution with the conda package manager, which handles not just Python packages but also compiled C/C++ libraries and CUDA dependencies. This is important because packages like PyTorch ship with pre-compiled CUDA bindings that must match your toolkit version.

terminal

# Download and install Miniconda
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda3
$ eval "$($HOME/miniconda3/bin/conda shell.bash hook)"
$ conda init bash

# Create an isolated environment for your AI project
$ conda create -n ml-env python=3.11 -y
$ conda activate ml-env

# Install PyTorch with CUDA support
$ pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126

After installation, verify that PyTorch can see your GPU. This is the moment of truth -- if this returns True, your entire stack (driver, CUDA, Python bindings) is working correctly:

$ python -c "import torch; print(torch.cuda.is_available())"

If the output is False, work backward through the stack: check nvidia-smi first (driver), then nvcc --version (CUDA toolkit), then verify that the PyTorch version you installed matches your CUDA version. Nine times out of ten, the issue is a version mismatch somewhere in that chain.

Containerized AI with Docker and NVIDIA Container Toolkit

Containers solve the reproducibility problem that plagues AI development. Instead of documenting every package version and hoping your colleague can recreate the environment, you ship the entire runtime as a single image. The NVIDIA Container Toolkit extends Docker to give containers direct access to your GPU hardware, which means you get the isolation benefits of containers without the performance penalty of virtualization.

terminal

# Install Docker (official method)
$ curl -fsSL https://get.docker.com | sh
$ sudo usermod -aG docker $USER
# Log out and back in for group membership to take effect

# Install NVIDIA Container Toolkit
$ curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
    | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
$ curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
    | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
    | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
$ sudo apt update && sudo apt install -y nvidia-container-toolkit
$ sudo nvidia-ctk runtime configure --runtime=docker
$ sudo systemctl restart docker

Test GPU access from inside a container:

$ docker run --rm --gpus all nvidia/cuda:12.6.0-base-ubuntu24.04 nvidia-smi

If you see the same nvidia-smi output from inside the container that you see on the host, GPU passthrough is working. You can now write Dockerfiles that bundle your model, dependencies, and inference code into a single portable image.

Pro Tip

NVIDIA provides pre-built containers on NGC (NVIDIA GPU Cloud) that include optimized versions of PyTorch, TensorFlow, and popular inference servers like Triton. These images are tested against specific driver versions and often outperform hand-built environments because they include tuned CUDA libraries. Browse the latest tags at NGC's PyTorch catalog and customize from there.

Monitoring GPU Resources

Once your workloads are running, you need visibility into how your GPU is performing. The basic nvidia-smi command gives you a snapshot, but for continuous monitoring during training runs, you'll want something more sophisticated.

For quick terminal-based monitoring, nvidia-smi supports a refresh mode that updates every second, similar to how top works for CPU processes:

$ watch -n 1 nvidia-smi

For a richer experience, nvtop provides an interactive, htop-style interface for GPU monitoring with usage graphs, per-process breakdowns, and memory allocation details:

$ sudo apt install -y nvtop && nvtop

Pay attention to three key metrics during training. GPU utilization should be consistently high (above 90%) during active computation -- if it's low, your data loading pipeline is likely the bottleneck, not the GPU itself. Memory usage tells you how close you are to running out of VRAM, which causes the dreaded CUDA out of memory error. Temperature should stay below 83C for sustained workloads as a conservative target (thermal throttling typically begins around 90C, but sustained heat above 83C reduces component longevity). If temperatures climb higher, check your cooling and consider reducing the power limit with nvidia-smi -pl.

Warning

If GPU utilization is low but memory is nearly full, your model fits in memory but your data pipeline can't feed the GPU fast enough. Increase the number of DataLoader workers in PyTorch, enable pin_memory=True, and consider storing your dataset on an NVMe SSD rather than a spinning disk.

Running Your First AI Workload

With the infrastructure in place, let's run something real. The following script downloads a pre-trained image classification model, loads it onto the GPU, and runs inference on a sample image. This is a minimal but complete example that exercises every layer of the stack you've just built:

inference_test.py

import torch
from torchvision import models, transforms
from PIL import Image
import urllib.request

# Select compute device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Load a pre-trained ResNet-50 model
model = models.resnet50(weights=models.ResNet50_Weights.DEFAULT)
model = model.to(device)
model.eval()

# Download a sample image
url = "https://upload.wikimedia.org/wikipedia/commons/thumb/4/4d/Cat_November_2010-1a.jpg/1200px-Cat_November_2010-1a.jpg"
urllib.request.urlretrieve(url, "sample.jpg")

# Preprocess the image
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    ),
])

img = Image.open("sample.jpg").convert("RGB")
input_tensor = preprocess(img).unsqueeze(0).to(device)

# Run inference
with torch.no_grad():
    output = model(input_tensor)
    probabilities = torch.nn.functional.softmax(output[0], dim=0)

# Get top 5 predictions
top5 = torch.topk(probabilities, 5)
for i in range(5):
    print(f"  {top5.values[i].item():.4f} - class {top5.indices[i].item()}")

$ python inference_test.py

If everything is configured correctly, the script will download the model weights (about 100MB on first run), load them onto the GPU, and classify the sample image in under a second. The output will show the top five predicted classes with their confidence scores. This might seem like a small result, but it proves that your entire stack -- from GPU drivers through CUDA to PyTorch -- is working end to end.

System-Level Resource Management

AI workloads are resource-intensive, and Linux gives you fine-grained control over how those resources are allocated. A few system-level tweaks can make a significant difference in training performance and system stability.

Shared memory is a common stumbling block. PyTorch's DataLoader uses shared memory (/dev/shm) to pass data between worker processes. On many systems, the default shared memory size is only 64MB, which is far too small for multi-worker data loading with large batches. Increase it:

terminal

# Check current shared memory size
$ df -h /dev/shm

# Increase to 16GB (add to /etc/fstab for persistence)
$ sudo mount -o remount,size=16G /dev/shm

# For Docker containers, pass the --shm-size flag
$ docker run --gpus all --shm-size=16g my-training-image

GPU persistence mode keeps the GPU driver loaded even when no applications are using it. Without this, the driver unloads after the last CUDA process exits, and the next process that starts has to wait for the driver to reinitialize -- adding several seconds of latency to every job:

$ sudo nvidia-smi -pm 1

Swap and OOM behavior deserve attention too. Training runs that exceed available system RAM can trigger the Linux OOM killer, which will terminate your training process without warning. You have two options: either disable swap entirely and let the OOM killer act quickly, or add generous swap space to give the system a buffer. For dedicated AI machines, adding swap on a fast NVMe drive is usually the better choice.

Caution

Never rely on swap for GPU memory. If your model exceeds available VRAM, the training will either crash with a CUDA OOM error or silently fall back to CPU computation, which can be hundreds of times slower. Reduce batch size, enable gradient checkpointing, or use mixed-precision training (FP16/BF16) to fit within your GPU's memory budget. Mixed-precision training stores model weights and gradients in half-precision floating point instead of full FP32, roughly halving VRAM usage while maintaining training accuracy. On GPUs with Ampere architecture or newer (RTX 30 series and above), BF16 is preferred because it preserves the same dynamic range as FP32.

Security Considerations

AI workloads introduce security considerations that differ from traditional server deployments. GPU drivers and CUDA libraries run with kernel-level access, which means vulnerabilities in these components can compromise the entire system. Keep your drivers updated, and subscribe to NVIDIA's security bulletin to stay ahead of disclosed CVEs.

If you're exposing inference endpoints to the network -- running a model behind a REST API, for example -- treat them like any other internet-facing service. Run the inference process as an unprivileged user, bind it to localhost and put it behind a reverse proxy, and apply rate limiting. Models can be computationally expensive to run, and an unprotected endpoint is an invitation for resource exhaustion attacks.

Container isolation adds a meaningful layer of defense. When you run AI workloads inside Docker containers with the NVIDIA Container Toolkit, the process sees only the GPU devices you've explicitly granted access to. Combine this with read-only filesystem mounts, dropped capabilities, and network namespaces to limit the blast radius if your application is compromised.

Where to Go from Here

You now have a working Linux environment for AI workloads: GPU drivers, CUDA toolkit, a managed Python environment, Docker with GPU passthrough, and the system-level tuning knowledge to keep everything running smoothly. That's a solid foundation, and it's the same foundation that production AI systems are built on.

The natural next steps depend on your goals. If you're training models, explore distributed training with PyTorch's DistributedDataParallel to scale across multiple GPUs or machines. If you're deploying models for inference, look into NVIDIA's TensorRT for optimization and Triton Inference Server for serving. If you're working with large language models, tools like vLLM and llama.cpp offer memory-efficient inference that can run surprisingly capable models on a single consumer GPU. For the simplest possible path to running a local LLM, Ollama wraps llama.cpp in a user-friendly interface that handles model downloading and serving with a single command.

The best AI infrastructure is the kind you don't have to think about. Get the foundation right -- drivers, environments, monitoring -- and you can focus on the work that actually matters: building things that learn.

Whatever direction you take, the core workflow stays the same: isolate your environments, monitor your resources, version everything, and automate what you can. The Linux tools for doing all of this are mature, well-documented, and battle-tested at scales far beyond what you'll encounter as a beginner. Trust the toolchain, and build from there.

^ back to top