Every system administrator has a story about the time strace brought a production server to its knees. You attached it to a busy process to diagnose a file descriptor leak, and within seconds the load average tripled. The tool that was supposed to help you find the problem became the problem itself. For decades, this was the tradeoff Linux admins accepted: deep kernel visibility came at the cost of dangerous overhead, fragile kernel modules, or recompiling the kernel from source.
eBPF changes that equation entirely. It is a technology built into the Linux kernel that allows you to run small, sandboxed programs at specific hook points -- system calls, function entries, network events, scheduler decisions -- with near-zero overhead and zero risk of crashing the system. Since Ubuntu 24.04, both bpftrace and bpfcc-tools ship in every default server installation. If you are running a modern Linux distribution on kernel 5.x or newer, you already have access to over 100 pre-built eBPF tracing tools across BCC and bpftrace that can answer questions strace and top never could.
The Problem with Traditional Tracing
To understand why eBPF matters for daily system administration, you need to understand what came before it and why those approaches fail in production.
strace and the ptrace Tax
strace works by attaching to a target process using the ptrace(2) system call -- the same mechanism debuggers use. Every time the traced process makes a system call, the kernel stops the process, notifies strace, lets strace inspect the call, then resumes the process. This stop-start cycle happens twice per system call (once on entry, once on exit), and on a busy server handling thousands of calls per second, the overhead can be catastrophic. Brendan Gregg, one of the leading performance engineers in the Linux ecosystem, documented cases where strace slowed a target process by over 100x. On a production web server processing thousands of requests per second, that is the difference between normal operation and a complete outage.
The fundamental issue is architectural: ptrace requires context switching between the kernel and the tracing process for every single event. There is no way to filter events in-kernel, no way to aggregate data before it crosses the kernel-user boundary, and no way to avoid the per-event overhead even for events you do not care about.
Kernel Modules: Powerful but Dangerous
Kernel modules can do anything the kernel can do -- which is precisely the problem. A bug in a kernel module does not result in a segmentation fault and a graceful crash. It results in a kernel panic, a hung system, or silent memory corruption that surfaces hours later as unexplained behavior. Writing kernel modules requires deep knowledge of kernel internals, and those internals change between kernel versions with no stable API guarantee.
SystemTap attempted to bridge this gap by compiling tracing scripts into kernel modules on the fly. It was powerful, but required kernel debug symbols (which are enormous and often not installed on production systems), had a significant compilation step, and still carried the risk of kernel crashes from malformed probe code. Many organizations prohibited SystemTap in production environments outright.
perf: Capable but Coarse
perf is a solid tool for sampling-based profiling and reading hardware performance counters, but its tracing capabilities are limited compared to what eBPF enables. You can trace tracepoints and kprobes with perf, but the data processing happens entirely in user space after the fact. You cannot filter, aggregate, or make decisions in-kernel, which means you either collect everything (and drown in data) or miss the events that matter.
How eBPF Works: Architecture for Sysadmins
eBPF stands for extended Berkeley Packet Filter. The original BPF, created by Steven McCanne and Van Jacobson in 1992 and published in 1993, was a simple virtual machine for filtering network packets in tcpdump. Starting with Linux 3.15, which first exposed the eBPF virtual machine to user space, and expanding significantly through 3.18 (which added BPF maps) and the 4.x and 5.x kernel series, eBPF evolved into a general-purpose in-kernel execution environment that now has very little to do with packet filtering. The name persists for historical reasons, but eBPF is today effectively a standalone technology name.
The execution model works in five steps. First, an eBPF program is written in a restricted subset of C. Second, Clang/LLVM compiles that C code into eBPF bytecode -- a custom instruction set with eleven 64-bit registers (ten general-purpose plus a read-only frame pointer), a 512-byte stack, and no dynamic heap allocation within the program itself (persistent state is stored in BPF maps, covered below). Third, the kernel's eBPF verifier statically analyzes every possible execution path in the program to prove it is safe: it must terminate, it must only access valid memory through approved helper functions, it cannot dereference null pointers, and it cannot corrupt kernel state. Since Linux 5.3, the verifier permits bounded loops -- loops where it can prove the iteration count has a fixed upper limit -- but unbounded or potentially infinite loops are still rejected. Fourth, if verification passes, a JIT compiler translates the bytecode into native machine code (x86_64, ARM64, etc.) for near-zero-overhead execution. Fifth, the compiled program is attached to a hook point in the kernel and runs every time that hook fires.
The verifier is what makes eBPF fundamentally different from kernel modules. Your eBPF program literally cannot crash the kernel. If it tries to do something unsafe, the kernel rejects it at load time before it ever executes. This is not a theoretical guarantee -- it is enforced by static analysis of every instruction.
Hook Points: Where eBPF Programs Attach
eBPF programs are event-driven. They do not run continuously; they execute when a specific event occurs. The available attachment points cover nearly every aspect of kernel behavior.
Tracepoints are static instrumentation points embedded in the kernel source code at carefully chosen locations. They have a relatively stable ABI, meaning the data structures they expose are maintained across kernel versions with far more consistency than internal function signatures. Examples include sched:sched_switch (when the scheduler swaps processes), tcp:tcp_retransmit_skb (when a TCP segment is retransmitted), and syscalls:sys_enter_openat (when a process calls openat()). Tracepoints are the preferred attachment point for production tracing because of their stability, though they can still change between major kernel versions in rare cases.
Kprobes and kretprobes provide dynamic instrumentation. A kprobe can hook into virtually any kernel function at its entry point, while a kretprobe fires when the function returns, allowing you to capture the return value. This is extraordinarily flexible -- you can trace any function listed in /proc/kallsyms -- but the interface is not stable across kernel versions. A function that exists in kernel 6.1 might be renamed, reorganized, or removed in 6.8. Use kprobes for targeted investigation; prefer tracepoints for long-running monitoring.
Uprobes and uretprobes do the same thing as kprobes but for user-space applications. You can attach eBPF programs to function entry and return points in any compiled binary without modifying the application or restarting it. This is powerful for tracing library calls (like SSL_read in OpenSSL), application-specific functions, or language runtime internals.
Perf events allow eBPF programs to be triggered by hardware performance counters (cache misses, branch mispredictions, CPU cycles) or software events, enabling sampling-based profiling with in-kernel aggregation.
BPF Maps: In-Kernel Data Structures
eBPF programs need a way to store state between invocations and communicate results to user space. BPF maps serve both purposes. They are kernel-resident key-value data structures that persist across program invocations and are accessible from both eBPF programs and user-space applications.
The available map types include hash maps (arbitrary key-value pairs), arrays (integer-indexed), per-CPU variants (one instance per CPU core, eliminating lock contention), LRU maps (automatic eviction of least recently used entries), ring buffers (high-performance streaming of events to user space), and stack trace maps (storing kernel and user-space call stacks). The choice of map type has real performance implications: a per-CPU hash map avoids cross-CPU locking entirely, making it suitable for high-frequency tracing, while a ring buffer provides ordered delivery of events to user space with backpressure semantics.
The eBPF Tooling Ecosystem
Almost nobody writes raw eBPF bytecode. In the same way that almost nobody writes x86 assembly to build applications, the eBPF ecosystem provides multiple layers of abstraction. For system administrators, three tools matter.
bpftrace: One-Liners and Quick Investigation
bpftrace is a high-level tracing language for Linux that compiles scripts into eBPF programs on the fly. Its syntax borrows from awk, C, and DTrace, making it accessible to anyone comfortable with shell scripting. It is the tool you reach for when you need an answer in the next 30 seconds.
For example, to count system calls by process name across the entire system:
# bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }' Attaching 1 probe... ^C @[sshd]: 847 @[nginx]: 12405 @[postgres]: 34219 @[node]: 67812
That single command attaches an eBPF program to the raw system call entry tracepoint, increments a per-process-name counter in a BPF hash map, and prints the aggregated results when you press Ctrl+C. The overhead is negligible because the counting happens entirely in kernel space -- no per-event data crosses the kernel-user boundary.
To generate a latency histogram for read() system calls in the VFS layer:
# bpftrace -e ' kprobe:vfs_read { @start[tid] = nsecs; } kretprobe:vfs_read { @usecs = hist((nsecs - @start[tid]) / 1000); delete(@start[tid]); }' Attaching 2 probes... ^C @usecs: [0] 12 | | [1] 340 |@@@@ | [2, 4) 2841 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [4, 8) 1293 |@@@@@@@@@@@@@@@@ | [8, 16) 421 |@@@@@ | [16, 32) 107 |@ | [32, 64) 34 | | [64, 128) 8 | | [128, 256) 2 | |
This attaches two probes: one at the entry of vfs_read (recording a nanosecond timestamp keyed by thread ID) and one at its return (computing the elapsed time and feeding it into a power-of-2 histogram). The histogram is computed entirely in kernel space. Even if vfs_read is called millions of times during the tracing period, the overhead remains minimal.
BCC: Pre-Built Sysadmin Power Tools
BCC (BPF Compiler Collection) provides a library of over 80 purpose-built tracing tools, each designed to answer a specific operational question. These are production-ready scripts that combine eBPF kernel instrumentation (written in C) with user-space interfaces (written in Python). You do not need to understand eBPF internals to use them.
Here are the BCC tools that every system administrator should know:
execsnoop-- traces new process execution via exec() in real time. Catches short-lived processes invisible to top or ps.opensnoop-- traces every file open system-wide with PID and path. Replacesstrace -e openatwithout the overhead.biolatency-- block I/O latency histogram. Shows your actual disk latency distribution, not averages that hide bimodal problems.tcplife-- TCP session lifespans with PID, process name, and bytes transferred. Replaces manual tcpdump analysis.tcpconnect-- traces active TCP connections with full process context. Invaluable for tracking outbound connections.tcpretrans-- shows TCP retransmissions with source/destination and state. Exposes network quality issues per-flow.runqlat-- CPU scheduler run queue latency. Reveals how long threads wait for CPU time. No practical equivalent existed before eBPF.ext4slower/xfs_slower-- shows filesystem operations exceeding a latency threshold with process and file details.cachestat-- page cache hit/miss statistics per second.capable-- traces Linux capability checks, revealing privilege requirements of processes.bashreadline-- captures bash commands typed across all shells system-wide.
On Ubuntu and Debian systems, BCC tools are available with the suffix -bpfcc. So you would run execsnoop-bpfcc or opensnoop-bpfcc. On Fedora and RHEL, they install directly as execsnoop, opensnoop, etc.
# execsnoop-bpfcc PCOMM PID PPID RET ARGS bash 14800 14799 0 /bin/bash id 14801 14800 0 /usr/bin/id -un hostname 14802 14800 0 /usr/bin/hostname grep 14803 14800 0 /usr/bin/grep -c ^processor /proc/cpuinfo cron 14804 1287 0 /usr/sbin/cron -f sh 14805 14804 0 /bin/sh -c /usr/local/bin/backup.sh
Notice something important: execsnoop captures short-lived processes that would never appear in top or ps. This is critical for diagnosing cron jobs that spawn cascading child processes, tracking unauthorized command execution, or understanding what happens during system boot and service restarts.
bpftool: Inspecting What Is Running
bpftool is the low-level diagnostic utility for eBPF itself. It does not help you write eBPF programs; instead, it shows you what eBPF programs are currently loaded in the kernel, what maps exist, and how they are attached. Think of it as ps for eBPF.
# bpftool prog list 12: tracepoint name sys_enter tag a1bc2d3e4f567890 loaded_at 2026-02-26T10:14:03+0000 uid 0 xlated 528B jited 312B memlock 4096B map_ids 4,5 pids execsnoop(14920) 37: kprobe name tcp_connect tag b2cd3e4f56789012 loaded_at 2026-02-26T10:15:47+0000 uid 0 xlated 384B jited 224B memlock 4096B map_ids 8 # bpftool map list 4: hash name events flags 0x0 key 4B value 128B max_entries 10240 memlock 1638400B 5: perf_event_array name events flags 0x0 key 4B value 4B max_entries 4 memlock 4096B
This is especially useful when diagnosing whether other eBPF programs (from container runtimes, security agents, or networking tools like Cilium) are already loaded and potentially affecting system behavior.
Practical Sysadmin Workflows
Theory is useful, but eBPF's real value is in solving the problems you face on a daily basis. Here are concrete workflows mapped to common sysadmin scenarios.
Diagnosing Slow Disk I/O
A user reports that writes are slow on a database server. iostat shows average write latency of 4ms, which looks fine. But averages lie. Run biolatency to see the actual distribution:
# biolatency-bpfcc -D 10 disk = 'nvme0n1' usecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 14 | | 8 -> 15 : 892 |@@@@@@@ | 16 -> 31 : 4521 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| 32 -> 63 : 2104 |@@@@@@@@@@@@@@@@@ | 64 -> 127 : 436 |@@@ | 128 -> 255 : 41 | | 256 -> 511 : 12 | | 512 -> 1023 : 3 | | 1024 -> 2047 : 0 | | 2048 -> 4095 : 0 | | 4096 -> 8191 : 87 | | 8192 -> 16383 : 23 | |
The histogram reveals a bimodal distribution: the vast majority of I/Os complete in 16-63 microseconds, but there is a secondary cluster at 4-16 milliseconds. Those outliers are what the database user is experiencing, and they were invisible in iostat's averages. To determine which processes are responsible for the slow I/Os, switch to ext4slower (or xfs_slower, depending on your filesystem):
# ext4slower-bpfcc 1 # show operations slower than 1ms TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME 10:24:01 postgres 3847 S 0 0 4.23 base/16384/24601 10:24:01 logrotate 9102 W 4096 0 12.41 syslog 10:24:03 postgres 3847 S 0 0 5.87 base/16384/24601 10:24:05 rsync 9130 R 131072 34816 8.91 mysql-backup.tar
Now you can see that postgres fsync operations and a concurrent rsync backup are competing for disk bandwidth. The logrotate write is the worst offender at 12ms. This level of per-event, per-process visibility with file-level granularity simply does not exist in any traditional tool.
Tracking Down Mysterious Network Connections
An intrusion detection system flags outbound connections to an unexpected IP range. Which process is responsible? tcpconnect gives you the answer in real time:
# tcpconnect-bpfcc -t TIME(s) PID COMM IP SADDR DADDR DPORT 0.000 3847 postgres 4 10.0.1.50 10.0.1.51 5432 0.102 8834 curl 4 10.0.1.50 93.184.216.34 443 1.245 22019 python3 4 10.0.1.50 198.51.100.47 8443 3.891 22019 python3 4 10.0.1.50 198.51.100.48 8443
The python3 process (PID 22019) is making connections to the flagged range. You can then investigate that PID with /proc/22019/cmdline, /proc/22019/cwd, and /proc/22019/exe to determine exactly what script is running and from where.
Auditing Process Execution
execsnoop provides a live feed of every process executed on the system. Unlike auditd, which requires careful rule configuration and writes to log files that must be parsed after the fact, execsnoop gives you immediate, human-readable output with process trees:
# execsnoop-bpfcc -T TIME PCOMM PID PPID RET ARGS 10:30:01 sh 15001 1287 0 /bin/sh -c /opt/scripts/health.sh 10:30:01 health.sh 15002 15001 0 /opt/scripts/health.sh 10:30:01 curl 15003 15002 0 /usr/bin/curl -s http://10.0.1.100/api/health 10:30:01 jq 15004 15002 0 /usr/bin/jq .status 10:31:22 sshd 15010 1102 0 /usr/sbin/sshd -D -R 10:31:22 bash 15012 15010 0 /bin/bash --login 10:31:23 sudo 15013 15012 0 /usr/bin/sudo -i 10:31:23 bash 15014 15013 0 /bin/bash
This captures the complete chain of events: cron runs a health check script, which spawns curl and jq. A few minutes later, someone SSH'd in, logged into bash, and immediately ran sudo -i. For incident response, this kind of timeline is extraordinarily valuable.
CPU Scheduler Analysis
runqlat answers a question that was essentially unanswerable before eBPF: how long are threads waiting in the CPU run queue before they get scheduled? If a server has adequate average CPU utilization but application latency is high, run queue latency may be the culprit:
# runqlat-bpfcc 10 1 usecs : count distribution 0 -> 1 : 1204 |@@@@@@ | 2 -> 3 : 7381 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| 4 -> 7 : 5219 |@@@@@@@@@@@@@@@@@@@@@@@@@ | 8 -> 15 : 2840 |@@@@@@@@@@@@@@ | 16 -> 31 : 1102 |@@@@@ | 32 -> 63 : 489 |@@ | 64 -> 127 : 215 |@ | 128 -> 255 : 98 | | 256 -> 511 : 47 | | 512 -> 1023 : 12 | | 1024 -> 2047 : 3 | |
Threads waiting over 256 microseconds for CPU time indicate contention. If you see a significant tail extending into the millisecond range, you likely have too many runnable threads competing for CPU cores, and the solution may be tuning thread pool sizes, adjusting cgroup CPU limits, or moving workloads to different nodes.
Installation and Prerequisites
eBPF tracing requires a kernel that supports it. The minimum practical version is 4.9, but many tools require features from 4.14 or later, and the full tooling ecosystem (including CO-RE portability and bounded loops) works best on 5.x and above. If you are running any currently-supported distribution (Ubuntu 22.04+, RHEL 8+, Fedora 36+, Debian 12+), you already have sufficient kernel support.
# Debian / Ubuntu $ sudo apt install bpftrace bpfcc-tools linux-headers-$(uname -r) # RHEL / Fedora $ sudo dnf install bpftrace bcc-tools # Arch Linux $ sudo pacman -S bpftrace bcc bcc-tools # Verify your kernel supports eBPF $ uname -r 6.8.0-45-generic # Check if BTF (BPF Type Format) is available $ ls /sys/kernel/btf/vmlinux /sys/kernel/btf/vmlinux # List available tools $ dpkg -L bpftrace bpfcc-tools | grep -E '/s?bin/' | head -20
eBPF tracing programs need CAP_BPF (or CAP_SYS_ADMIN on kernels before 5.8). In practice, this means running as root or with carefully configured capabilities. Since eBPF programs can inspect sensitive kernel data including passwords in memory, process arguments, and network traffic, this privilege requirement is appropriate and should not be bypassed.
CO-RE and BTF: Portable eBPF
One of the historical pain points with eBPF tools was that they often depended on kernel headers matching the running kernel. If headers were missing or mismatched, tools would fail to compile. CO-RE (Compile Once, Run Everywhere) and BTF (BPF Type Format) solve this. BTF is a compact metadata format embedded in the kernel that describes the layout of kernel data structures. CO-RE-enabled eBPF programs, built using the libbpf library, read this metadata at load time and automatically adjust their memory access patterns to match the running kernel's structure layout, without needing kernel headers installed. Tools like bpftrace and many newer BCC tools leverage CO-RE and libbpf under the hood, which is why they work reliably across different kernel versions without recompilation.
You can check whether your kernel includes BTF data by looking for /sys/kernel/btf/vmlinux. BTF support was introduced in kernel 4.18, but it became practically useful and widely shipped by distributions around kernel 5.4 with CONFIG_DEBUG_INFO_BTF=y. All current major distribution kernels ship with BTF enabled.
eBPF vs. Traditional Tools: An Honest Comparison
strace still wins for quick-and-dirty debugging on a development machine where overhead does not matter. perf remains excellent for CPU profiling and hardware counter analysis. eBPF does not replace these tools in every context -- it replaces them in production and high-load scenarios where their limitations become unacceptable.
The key differences break down along several axes. Overhead: strace imposes extreme overhead via ptrace context switching; eBPF uses JIT-compiled in-kernel programs with negligible cost. Safety: kernel modules can panic the system; eBPF programs are verified before execution and cannot crash the kernel. Filtering: strace collects everything and filters in user space; eBPF filters and aggregates in-kernel, transferring only summarized results. Scope: strace attaches to a single process; eBPF traces system-wide across all processes simultaneously. Dependencies: SystemTap requires kernel debug symbols; eBPF with CO-RE/BTF and libbpf requires nothing beyond a modern kernel.
Prefer tracepoints over kprobes whenever possible. Tracepoints are part of the kernel's stable ABI and won't break across versions. Kprobes hook into internal functions that can change without notice. Use bpftrace -l 'tracepoint:*' to see all available tracepoints on your system.
Security Considerations
eBPF is a double-edged tool. The same capability that lets system administrators trace kernel internals can, in the hands of an attacker with root access, be used to build extremely stealthy rootkits. eBPF programs run within the kernel's trust boundary, which means they can intercept and modify data in ways that are difficult to detect from user space.
Security researchers have demonstrated proof-of-concept eBPF rootkits that hook system calls to hide processes, filter network traffic to conceal connections, and tamper with the data that security monitoring tools (themselves often eBPF-based) rely on. Projects like Boopkit, TripleCross, and ebpfkit showed that eBPF programs could run hidden logic while staying within verifier-approved rules.
For sysadmins, the defensive implications are straightforward. Restrict who can load eBPF programs using the CAP_BPF capability (available since kernel 5.8) and avoid running containers with CAP_SYS_ADMIN. Use bpftool prog list regularly to audit what eBPF programs are loaded -- unexpected programs are a red flag. On systems where eBPF is not needed by unprivileged users, disable unprivileged BPF access via sysctl kernel.unprivileged_bpf_disabled=1 (note that setting this to 1 is a one-way toggle until reboot; once set, it cannot be re-enabled without rebooting). Enable Secure Boot and kernel module signing to prevent the loading of kernel modules that could tamper with eBPF subsystem internals.
Beyond Tracing: Where eBPF Is Heading
While this article focuses on tracing and observability, eBPF's reach extends considerably further. In networking, Cilium uses eBPF to replace iptables and kube-proxy in Kubernetes environments, processing network policies at the XDP (eXpress Data Path) layer for dramatic performance improvements. XDP allows eBPF programs to run at the earliest point in the network stack -- inside or just after the network driver -- enabling packet decisions before the kernel allocates a socket buffer. Cloudflare uses XDP-based eBPF programs to mitigate DDoS attacks by dropping malicious packets before they consume kernel resources.
In security, tools like Falco, Tracee, and Tetragon use eBPF to implement runtime threat detection that monitors system calls, file access patterns, and network behavior in real time. These tools provide the kernel-level telemetry that user-space security agents cannot match.
The eBPF Foundation, hosted under the Linux Foundation, now steers the technology's development with contributions from Cisco (which acquired Isovalent, the creators of Cilium), Meta, Google, Microsoft, and others. Academic publications mentioning eBPF have increased year over year since 2016, and conferences like KubeCon and Linux Plumbers now dedicate entire tracks to eBPF sessions.
Getting Started: A Five-Minute Checklist
If you have read this far and want to start using eBPF on your servers today, here is the minimal path.
Install the tooling: apt install bpftrace bpfcc-tools on Debian/Ubuntu, or dnf install bpftrace bcc-tools on RHEL/Fedora. Verify BTF is available: ls /sys/kernel/btf/vmlinux. Run execsnoop and watch processes spawn for a few minutes. Run opensnoop and filter to a specific process with -p PID. Run biolatency for 10 seconds and read the histogram. Try a bpftrace one-liner:
That sequence takes five minutes and will give you more kernel visibility than you have ever had before. From there, explore the full BCC and bpftrace tool collections (check the example files in /usr/share/doc/bpfcc-tools/examples/ and /usr/share/doc/bpftrace/examples/), read the bpftrace reference guide, and keep Brendan Gregg's BPF Performance Tools book on your shelf for the deep dives.
eBPF does not just make tracing better. It makes an entire class of previously impractical investigations routine. The information has always been there, inside the kernel. Now you can actually reach it without setting the building on fire.