Every system administrator has a story about the time strace brought a production server to its knees. You attached it to a busy process to diagnose a file descriptor leak, and within seconds the load average tripled. The tool that was supposed to help you find the problem became the problem itself. For decades, this was the tradeoff Linux admins accepted: deep kernel visibility came at the cost of dangerous overhead, fragile kernel modules, or recompiling the kernel from source.

eBPF changes that equation entirely. It is a technology built into the Linux kernel that allows you to run small, sandboxed programs at specific hook points -- system calls, function entries, network events, scheduler decisions -- with near-zero overhead and zero risk of crashing the system. Since Ubuntu 24.04, both bpftrace and bpfcc-tools ship in every default server installation. If you are running a modern Linux distribution on kernel 5.x or newer, you already have access to over 100 pre-built eBPF tracing tools across BCC and bpftrace that can answer questions strace and top never could.

The Problem with Traditional Tracing

To understand why eBPF matters for daily system administration, you need to understand what came before it and why those approaches fail in production.

strace and the ptrace Tax

strace works by attaching to a target process using the ptrace(2) system call -- the same mechanism debuggers use. Every time the traced process makes a system call, the kernel stops the process, notifies strace, lets strace inspect the call, then resumes the process. This stop-start cycle happens twice per system call (once on entry, once on exit), and on a busy server handling thousands of calls per second, the overhead can be catastrophic. Brendan Gregg, one of the leading performance engineers in the Linux ecosystem, documented cases where strace slowed a target process by over 100x. On a production web server processing thousands of requests per second, that is the difference between normal operation and a complete outage.

The fundamental issue is architectural: ptrace requires context switching between the kernel and the tracing process for every single event. There is no way to filter events in-kernel, no way to aggregate data before it crosses the kernel-user boundary, and no way to avoid the per-event overhead even for events you do not care about.

Kernel Modules: Powerful but Dangerous

Kernel modules can do anything the kernel can do -- which is precisely the problem. A bug in a kernel module does not result in a segmentation fault and a graceful crash. It results in a kernel panic, a hung system, or silent memory corruption that surfaces hours later as unexplained behavior. Writing kernel modules requires deep knowledge of kernel internals, and those internals change between kernel versions with no stable API guarantee.

SystemTap attempted to bridge this gap by compiling tracing scripts into kernel modules on the fly. It was powerful, but required kernel debug symbols (which are enormous and often not installed on production systems), had a significant compilation step, and still carried the risk of kernel crashes from malformed probe code. Many organizations prohibited SystemTap in production environments outright.

perf: Capable but Coarse

perf is a solid tool for sampling-based profiling and reading hardware performance counters, but its tracing capabilities are limited compared to what eBPF enables. You can trace tracepoints and kprobes with perf, but the data processing happens entirely in user space after the fact. You cannot filter, aggregate, or make decisions in-kernel, which means you either collect everything (and drown in data) or miss the events that matter.

How eBPF Works: Architecture for Sysadmins

eBPF stands for extended Berkeley Packet Filter. The original BPF, created by Steven McCanne and Van Jacobson in 1992 and published in 1993, was a simple virtual machine for filtering network packets in tcpdump. Starting with Linux 3.15, which first exposed the eBPF virtual machine to user space, and expanding significantly through 3.18 (which added BPF maps) and the 4.x and 5.x kernel series, eBPF evolved into a general-purpose in-kernel execution environment that now has very little to do with packet filtering. The name persists for historical reasons, but eBPF is today effectively a standalone technology name.

The execution model works in five steps. First, an eBPF program is written in a restricted subset of C. Second, Clang/LLVM compiles that C code into eBPF bytecode -- a custom instruction set with eleven 64-bit registers (ten general-purpose plus a read-only frame pointer), a 512-byte stack, and no dynamic heap allocation within the program itself (persistent state is stored in BPF maps, covered below). Third, the kernel's eBPF verifier statically analyzes every possible execution path in the program to prove it is safe: it must terminate, it must only access valid memory through approved helper functions, it cannot dereference null pointers, and it cannot corrupt kernel state. Since Linux 5.3, the verifier permits bounded loops -- loops where it can prove the iteration count has a fixed upper limit -- but unbounded or potentially infinite loops are still rejected. Fourth, if verification passes, a JIT compiler translates the bytecode into native machine code (x86_64, ARM64, etc.) for near-zero-overhead execution. Fifth, the compiled program is attached to a hook point in the kernel and runs every time that hook fires.

Note

The verifier is what makes eBPF fundamentally different from kernel modules. Your eBPF program literally cannot crash the kernel. If it tries to do something unsafe, the kernel rejects it at load time before it ever executes. This is not a theoretical guarantee -- it is enforced by static analysis of every instruction.

Hook Points: Where eBPF Programs Attach

eBPF programs are event-driven. They do not run continuously; they execute when a specific event occurs. The available attachment points cover nearly every aspect of kernel behavior.

Tracepoints are static instrumentation points embedded in the kernel source code at carefully chosen locations. They have a relatively stable ABI, meaning the data structures they expose are maintained across kernel versions with far more consistency than internal function signatures. Examples include sched:sched_switch (when the scheduler swaps processes), tcp:tcp_retransmit_skb (when a TCP segment is retransmitted), and syscalls:sys_enter_openat (when a process calls openat()). Tracepoints are the preferred attachment point for production tracing because of their stability, though they can still change between major kernel versions in rare cases.

Kprobes and kretprobes provide dynamic instrumentation. A kprobe can hook into virtually any kernel function at its entry point, while a kretprobe fires when the function returns, allowing you to capture the return value. This is extraordinarily flexible -- you can trace any function listed in /proc/kallsyms -- but the interface is not stable across kernel versions. A function that exists in kernel 6.1 might be renamed, reorganized, or removed in 6.8. Use kprobes for targeted investigation; prefer tracepoints for long-running monitoring.

Uprobes and uretprobes do the same thing as kprobes but for user-space applications. You can attach eBPF programs to function entry and return points in any compiled binary without modifying the application or restarting it. This is powerful for tracing library calls (like SSL_read in OpenSSL), application-specific functions, or language runtime internals.

Perf events allow eBPF programs to be triggered by hardware performance counters (cache misses, branch mispredictions, CPU cycles) or software events, enabling sampling-based profiling with in-kernel aggregation.

BPF Maps: In-Kernel Data Structures

eBPF programs need a way to store state between invocations and communicate results to user space. BPF maps serve both purposes. They are kernel-resident key-value data structures that persist across program invocations and are accessible from both eBPF programs and user-space applications.

The available map types include hash maps (arbitrary key-value pairs), arrays (integer-indexed), per-CPU variants (one instance per CPU core, eliminating lock contention), LRU maps (automatic eviction of least recently used entries), ring buffers (high-performance streaming of events to user space), and stack trace maps (storing kernel and user-space call stacks). The choice of map type has real performance implications: a per-CPU hash map avoids cross-CPU locking entirely, making it suitable for high-frequency tracing, while a ring buffer provides ordered delivery of events to user space with backpressure semantics.

The eBPF Tooling Ecosystem

Almost nobody writes raw eBPF bytecode. In the same way that almost nobody writes x86 assembly to build applications, the eBPF ecosystem provides multiple layers of abstraction. For system administrators, three tools matter.

bpftrace: One-Liners and Quick Investigation

bpftrace is a high-level tracing language for Linux that compiles scripts into eBPF programs on the fly. Its syntax borrows from awk, C, and DTrace, making it accessible to anyone comfortable with shell scripting. It is the tool you reach for when you need an answer in the next 30 seconds.

For example, to count system calls by process name across the entire system:

bpftrace -- syscall counting
# bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
Attaching 1 probe...
^C

@[sshd]: 847
@[nginx]: 12405
@[postgres]: 34219
@[node]: 67812

That single command attaches an eBPF program to the raw system call entry tracepoint, increments a per-process-name counter in a BPF hash map, and prints the aggregated results when you press Ctrl+C. The overhead is negligible because the counting happens entirely in kernel space -- no per-event data crosses the kernel-user boundary.

To generate a latency histogram for read() system calls in the VFS layer:

bpftrace -- VFS read latency histogram
# bpftrace -e '
  kprobe:vfs_read { @start[tid] = nsecs; }
  kretprobe:vfs_read {
    @usecs = hist((nsecs - @start[tid]) / 1000);
    delete(@start[tid]);
  }'
Attaching 2 probes...
^C

@usecs:
[0]                   12 |                                    |
[1]                  340 |@@@@                                |
[2, 4)              2841 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[4, 8)              1293 |@@@@@@@@@@@@@@@@                    |
[8, 16)              421 |@@@@@                               |
[16, 32)             107 |@                                   |
[32, 64)              34 |                                    |
[64, 128)              8 |                                    |
[128, 256)             2 |                                    |

This attaches two probes: one at the entry of vfs_read (recording a nanosecond timestamp keyed by thread ID) and one at its return (computing the elapsed time and feeding it into a power-of-2 histogram). The histogram is computed entirely in kernel space. Even if vfs_read is called millions of times during the tracing period, the overhead remains minimal.

BCC: Pre-Built Sysadmin Power Tools

BCC (BPF Compiler Collection) provides a library of over 80 purpose-built tracing tools, each designed to answer a specific operational question. These are production-ready scripts that combine eBPF kernel instrumentation (written in C) with user-space interfaces (written in Python). You do not need to understand eBPF internals to use them.

Here are the BCC tools that every system administrator should know:

On Ubuntu and Debian systems, BCC tools are available with the suffix -bpfcc. So you would run execsnoop-bpfcc or opensnoop-bpfcc. On Fedora and RHEL, they install directly as execsnoop, opensnoop, etc.

execsnoop -- trace new processes
# execsnoop-bpfcc
PCOMM            PID    PPID   RET ARGS
bash             14800  14799    0 /bin/bash
id               14801  14800    0 /usr/bin/id -un
hostname         14802  14800    0 /usr/bin/hostname
grep             14803  14800    0 /usr/bin/grep -c ^processor /proc/cpuinfo
cron             14804  1287     0 /usr/sbin/cron -f
sh               14805  14804    0 /bin/sh -c /usr/local/bin/backup.sh

Notice something important: execsnoop captures short-lived processes that would never appear in top or ps. This is critical for diagnosing cron jobs that spawn cascading child processes, tracking unauthorized command execution, or understanding what happens during system boot and service restarts.

bpftool: Inspecting What Is Running

bpftool is the low-level diagnostic utility for eBPF itself. It does not help you write eBPF programs; instead, it shows you what eBPF programs are currently loaded in the kernel, what maps exist, and how they are attached. Think of it as ps for eBPF.

bpftool -- inspect loaded eBPF programs
# bpftool prog list
12: tracepoint  name sys_enter  tag a1bc2d3e4f567890
      loaded_at 2026-02-26T10:14:03+0000  uid 0
      xlated 528B  jited 312B  memlock 4096B  map_ids 4,5
      pids execsnoop(14920)
37: kprobe  name tcp_connect  tag b2cd3e4f56789012
      loaded_at 2026-02-26T10:15:47+0000  uid 0
      xlated 384B  jited 224B  memlock 4096B  map_ids 8

# bpftool map list
4: hash  name events  flags 0x0
      key 4B  value 128B  max_entries 10240  memlock 1638400B
5: perf_event_array  name events  flags 0x0
      key 4B  value 4B  max_entries 4  memlock 4096B

This is especially useful when diagnosing whether other eBPF programs (from container runtimes, security agents, or networking tools like Cilium) are already loaded and potentially affecting system behavior.

Practical Sysadmin Workflows

Theory is useful, but eBPF's real value is in solving the problems you face on a daily basis. Here are concrete workflows mapped to common sysadmin scenarios.

Diagnosing Slow Disk I/O

A user reports that writes are slow on a database server. iostat shows average write latency of 4ms, which looks fine. But averages lie. Run biolatency to see the actual distribution:

biolatency -- block I/O latency distribution
# biolatency-bpfcc -D 10
disk = 'nvme0n1'

     usecs               : count     distribution
         0 -> 1          : 0        |                                    |
         2 -> 3          : 0        |                                    |
         4 -> 7          : 14       |                                    |
         8 -> 15         : 892      |@@@@@@@                             |
        16 -> 31         : 4521     |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
        32 -> 63         : 2104     |@@@@@@@@@@@@@@@@@                   |
        64 -> 127        : 436      |@@@                                 |
       128 -> 255        : 41       |                                    |
       256 -> 511        : 12       |                                    |
       512 -> 1023       : 3        |                                    |
      1024 -> 2047       : 0        |                                    |
      2048 -> 4095       : 0        |                                    |
      4096 -> 8191       : 87       |                                    |
      8192 -> 16383      : 23       |                                    |

The histogram reveals a bimodal distribution: the vast majority of I/Os complete in 16-63 microseconds, but there is a secondary cluster at 4-16 milliseconds. Those outliers are what the database user is experiencing, and they were invisible in iostat's averages. To determine which processes are responsible for the slow I/Os, switch to ext4slower (or xfs_slower, depending on your filesystem):

ext4slower -- find slow filesystem operations
# ext4slower-bpfcc 1 # show operations slower than 1ms
TIME     COMM           PID    T BYTES   OFF_KB   LAT(ms) FILENAME
10:24:01 postgres       3847   S 0       0         4.23   base/16384/24601
10:24:01 logrotate      9102   W 4096    0        12.41   syslog
10:24:03 postgres       3847   S 0       0         5.87   base/16384/24601
10:24:05 rsync          9130   R 131072  34816     8.91   mysql-backup.tar

Now you can see that postgres fsync operations and a concurrent rsync backup are competing for disk bandwidth. The logrotate write is the worst offender at 12ms. This level of per-event, per-process visibility with file-level granularity simply does not exist in any traditional tool.

Tracking Down Mysterious Network Connections

An intrusion detection system flags outbound connections to an unexpected IP range. Which process is responsible? tcpconnect gives you the answer in real time:

tcpconnect -- trace outbound TCP connections
# tcpconnect-bpfcc -t
TIME(s)  PID    COMM         IP SADDR            DADDR            DPORT
0.000    3847   postgres     4  10.0.1.50        10.0.1.51        5432
0.102    8834   curl         4  10.0.1.50        93.184.216.34    443
1.245    22019  python3      4  10.0.1.50        198.51.100.47    8443
3.891    22019  python3      4  10.0.1.50        198.51.100.48    8443

The python3 process (PID 22019) is making connections to the flagged range. You can then investigate that PID with /proc/22019/cmdline, /proc/22019/cwd, and /proc/22019/exe to determine exactly what script is running and from where.

Auditing Process Execution

execsnoop provides a live feed of every process executed on the system. Unlike auditd, which requires careful rule configuration and writes to log files that must be parsed after the fact, execsnoop gives you immediate, human-readable output with process trees:

execsnoop -- real-time process audit
# execsnoop-bpfcc -T
TIME     PCOMM            PID    PPID   RET ARGS
10:30:01 sh               15001  1287     0 /bin/sh -c /opt/scripts/health.sh
10:30:01 health.sh        15002  15001    0 /opt/scripts/health.sh
10:30:01 curl             15003  15002    0 /usr/bin/curl -s http://10.0.1.100/api/health
10:30:01 jq               15004  15002    0 /usr/bin/jq .status
10:31:22 sshd             15010  1102     0 /usr/sbin/sshd -D -R
10:31:22 bash             15012  15010    0 /bin/bash --login
10:31:23 sudo             15013  15012    0 /usr/bin/sudo -i
10:31:23 bash             15014  15013    0 /bin/bash

This captures the complete chain of events: cron runs a health check script, which spawns curl and jq. A few minutes later, someone SSH'd in, logged into bash, and immediately ran sudo -i. For incident response, this kind of timeline is extraordinarily valuable.

CPU Scheduler Analysis

runqlat answers a question that was essentially unanswerable before eBPF: how long are threads waiting in the CPU run queue before they get scheduled? If a server has adequate average CPU utilization but application latency is high, run queue latency may be the culprit:

runqlat -- scheduler run queue latency
# runqlat-bpfcc 10 1
     usecs               : count     distribution
         0 -> 1          : 1204     |@@@@@@                              |
         2 -> 3          : 7381     |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
         4 -> 7          : 5219     |@@@@@@@@@@@@@@@@@@@@@@@@@           |
         8 -> 15         : 2840     |@@@@@@@@@@@@@@                      |
        16 -> 31         : 1102     |@@@@@                               |
        32 -> 63         : 489      |@@                                  |
        64 -> 127        : 215      |@                                   |
       128 -> 255        : 98       |                                    |
       256 -> 511        : 47       |                                    |
       512 -> 1023       : 12       |                                    |
      1024 -> 2047       : 3        |                                    |

Threads waiting over 256 microseconds for CPU time indicate contention. If you see a significant tail extending into the millisecond range, you likely have too many runnable threads competing for CPU cores, and the solution may be tuning thread pool sizes, adjusting cgroup CPU limits, or moving workloads to different nodes.

Installation and Prerequisites

eBPF tracing requires a kernel that supports it. The minimum practical version is 4.9, but many tools require features from 4.14 or later, and the full tooling ecosystem (including CO-RE portability and bounded loops) works best on 5.x and above. If you are running any currently-supported distribution (Ubuntu 22.04+, RHEL 8+, Fedora 36+, Debian 12+), you already have sufficient kernel support.

installation
# Debian / Ubuntu
$ sudo apt install bpftrace bpfcc-tools linux-headers-$(uname -r)

# RHEL / Fedora
$ sudo dnf install bpftrace bcc-tools

# Arch Linux
$ sudo pacman -S bpftrace bcc bcc-tools

# Verify your kernel supports eBPF
$ uname -r
6.8.0-45-generic

# Check if BTF (BPF Type Format) is available
$ ls /sys/kernel/btf/vmlinux
/sys/kernel/btf/vmlinux

# List available tools
$ dpkg -L bpftrace bpfcc-tools | grep -E '/s?bin/' | head -20
Warning

eBPF tracing programs need CAP_BPF (or CAP_SYS_ADMIN on kernels before 5.8). In practice, this means running as root or with carefully configured capabilities. Since eBPF programs can inspect sensitive kernel data including passwords in memory, process arguments, and network traffic, this privilege requirement is appropriate and should not be bypassed.

CO-RE and BTF: Portable eBPF

One of the historical pain points with eBPF tools was that they often depended on kernel headers matching the running kernel. If headers were missing or mismatched, tools would fail to compile. CO-RE (Compile Once, Run Everywhere) and BTF (BPF Type Format) solve this. BTF is a compact metadata format embedded in the kernel that describes the layout of kernel data structures. CO-RE-enabled eBPF programs, built using the libbpf library, read this metadata at load time and automatically adjust their memory access patterns to match the running kernel's structure layout, without needing kernel headers installed. Tools like bpftrace and many newer BCC tools leverage CO-RE and libbpf under the hood, which is why they work reliably across different kernel versions without recompilation.

You can check whether your kernel includes BTF data by looking for /sys/kernel/btf/vmlinux. BTF support was introduced in kernel 4.18, but it became practically useful and widely shipped by distributions around kernel 5.4 with CONFIG_DEBUG_INFO_BTF=y. All current major distribution kernels ship with BTF enabled.

eBPF vs. Traditional Tools: An Honest Comparison

strace still wins for quick-and-dirty debugging on a development machine where overhead does not matter. perf remains excellent for CPU profiling and hardware counter analysis. eBPF does not replace these tools in every context -- it replaces them in production and high-load scenarios where their limitations become unacceptable.

The key differences break down along several axes. Overhead: strace imposes extreme overhead via ptrace context switching; eBPF uses JIT-compiled in-kernel programs with negligible cost. Safety: kernel modules can panic the system; eBPF programs are verified before execution and cannot crash the kernel. Filtering: strace collects everything and filters in user space; eBPF filters and aggregates in-kernel, transferring only summarized results. Scope: strace attaches to a single process; eBPF traces system-wide across all processes simultaneously. Dependencies: SystemTap requires kernel debug symbols; eBPF with CO-RE/BTF and libbpf requires nothing beyond a modern kernel.

Pro Tip

Prefer tracepoints over kprobes whenever possible. Tracepoints are part of the kernel's stable ABI and won't break across versions. Kprobes hook into internal functions that can change without notice. Use bpftrace -l 'tracepoint:*' to see all available tracepoints on your system.

Security Considerations

eBPF is a double-edged tool. The same capability that lets system administrators trace kernel internals can, in the hands of an attacker with root access, be used to build extremely stealthy rootkits. eBPF programs run within the kernel's trust boundary, which means they can intercept and modify data in ways that are difficult to detect from user space.

Security researchers have demonstrated proof-of-concept eBPF rootkits that hook system calls to hide processes, filter network traffic to conceal connections, and tamper with the data that security monitoring tools (themselves often eBPF-based) rely on. Projects like Boopkit, TripleCross, and ebpfkit showed that eBPF programs could run hidden logic while staying within verifier-approved rules.

For sysadmins, the defensive implications are straightforward. Restrict who can load eBPF programs using the CAP_BPF capability (available since kernel 5.8) and avoid running containers with CAP_SYS_ADMIN. Use bpftool prog list regularly to audit what eBPF programs are loaded -- unexpected programs are a red flag. On systems where eBPF is not needed by unprivileged users, disable unprivileged BPF access via sysctl kernel.unprivileged_bpf_disabled=1 (note that setting this to 1 is a one-way toggle until reboot; once set, it cannot be re-enabled without rebooting). Enable Secure Boot and kernel module signing to prevent the loading of kernel modules that could tamper with eBPF subsystem internals.

Beyond Tracing: Where eBPF Is Heading

While this article focuses on tracing and observability, eBPF's reach extends considerably further. In networking, Cilium uses eBPF to replace iptables and kube-proxy in Kubernetes environments, processing network policies at the XDP (eXpress Data Path) layer for dramatic performance improvements. XDP allows eBPF programs to run at the earliest point in the network stack -- inside or just after the network driver -- enabling packet decisions before the kernel allocates a socket buffer. Cloudflare uses XDP-based eBPF programs to mitigate DDoS attacks by dropping malicious packets before they consume kernel resources.

In security, tools like Falco, Tracee, and Tetragon use eBPF to implement runtime threat detection that monitors system calls, file access patterns, and network behavior in real time. These tools provide the kernel-level telemetry that user-space security agents cannot match.

The eBPF Foundation, hosted under the Linux Foundation, now steers the technology's development with contributions from Cisco (which acquired Isovalent, the creators of Cilium), Meta, Google, Microsoft, and others. Academic publications mentioning eBPF have increased year over year since 2016, and conferences like KubeCon and Linux Plumbers now dedicate entire tracks to eBPF sessions.

Getting Started: A Five-Minute Checklist

If you have read this far and want to start using eBPF on your servers today, here is the minimal path.

Install the tooling: apt install bpftrace bpfcc-tools on Debian/Ubuntu, or dnf install bpftrace bcc-tools on RHEL/Fedora. Verify BTF is available: ls /sys/kernel/btf/vmlinux. Run execsnoop and watch processes spawn for a few minutes. Run opensnoop and filter to a specific process with -p PID. Run biolatency for 10 seconds and read the histogram. Try a bpftrace one-liner:

# bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s %s\n", comm, str(args->filename)); }'

That sequence takes five minutes and will give you more kernel visibility than you have ever had before. From there, explore the full BCC and bpftrace tool collections (check the example files in /usr/share/doc/bpfcc-tools/examples/ and /usr/share/doc/bpftrace/examples/), read the bpftrace reference guide, and keep Brendan Gregg's BPF Performance Tools book on your shelf for the deep dives.

eBPF does not just make tracing better. It makes an entire class of previously impractical investigations routine. The information has always been there, inside the kernel. Now you can actually reach it without setting the building on fire.