Every file you open, every command you run, every process Linux spawns exists in two worlds simultaneously: the human-readable world of text, source code, and abstractions, and the machine-level world of binary -- sequences of ones and zeros that represent the only language a processor actually understands. Understanding the relationship between binary and Linux is not merely an academic exercise. It is the foundation of understanding how Linux works at every level, from the moment the bootloader hands control to the kernel to the moment a userspace application writes bytes to disk.
What Binary Actually Means in a Computing Context
Binary is a base-2 number system using only two symbols: 0 and 1. These symbols correspond directly to the two stable electrical states a transistor can maintain -- off and on, low voltage and high voltage. Every piece of data a computer processes, every instruction it executes, every address it references in memory is encoded as a sequence of these binary digits, called bits.
Eight bits form a byte, and bytes are the fundamental unit of addressable memory on modern systems. Although 64-bit registers can theoretically address 2 to the power of 64 unique memory locations (roughly 18.4 exabytes), current x86-64 hardware does not implement the full width. With standard 4-level paging, virtual addresses are 48 bits wide, yielding a 256 TiB virtual address space. With 5-level paging -- supported by Intel Ice Lake and AMD Zen 4 (EPYC 9004/8004 series and Threadripper PRO 7000 series) and later, with kernel support first merged in Linux 4.12 (2017) via CONFIG_X86_5LEVEL -- the hardware uses a 57-bit virtual address width (LA57; the underlying register remains 64 bits wide). When both the kernel is built with CONFIG_X86_5LEVEL and the CPU reports support via CPUID, the kernel activates 5-level paging at boot; CONFIG_X86_5LEVEL was enabled by default starting with Linux 5.5 (2019), and a kernel built with the option can fall back gracefully to 4-level paging on unsupported hardware at runtime. However, even when 5-level paging is active, the kernel defaults userspace virtual address space to 47 bits for compatibility: known JIT compilers encode metadata in high pointer bits, which collide with valid addresses above 47 bits. Applications can explicitly request allocations above 47 bits by providing a hint address above that boundary in mmap(). The full 56-bit userspace space (128 PiB) is available on request, but not by default. [kernel.org: x86/x86_64/5level-paging.rst] The point is that the entire abstraction stack of a modern Linux system, from the kernel's memory management subsystem to the shell prompt you type commands into, ultimately rests on the manipulation of binary values at the hardware level.
Hexadecimal (base-16) is the notation most commonly used when examining binary data directly because it compresses four bits into a single character, making long binary strings far more readable. The hex value 0x4C is the binary value 01001100. When you run xxd or hexdump on a file, you are looking at binary data rendered in hexadecimal.
The ELF Format: How Linux Understands Binary Programs
When a developer compiles a C program on Linux, the compiler and linker produce an Executable and Linkable Format (ELF) binary. ELF is the standard binary format for executables, shared libraries, object files, and core dumps on Linux. The format was originally developed by UNIX System Laboratories as part of the System V ABI, and its specification was published by the TIS Committee in May 1995 -- after which the committee wound down. ELF was chosen as the standard binary format for Unix-like systems on x86 by the 86open project, which began discussions in 1997 and formally dissolved in July 1999 having declared the Linux ELF implementation the de facto industry standard. [TIS ELF Specification v1.2, 1995; 86open Final Update, July 1999] Understanding ELF is central to understanding how Linux bridges human-written code and machine execution.
An ELF file begins with a magic number: the four bytes 0x7F followed by the ASCII characters E, L, and F. You can verify this yourself with any compiled binary by running xxd on it and examining the first four bytes. This magic number is how the Linux kernel's exec subsystem identifies a file as an ELF binary.
$ xxd /bin/ls | head -3 00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000 .ELF............ 00000010: 0300 3e00 0100 0000 b05e 0000 0000 0000 ..>......^...... 00000020: 4000 0000 0000 0000 b022 0200 0000 0000 @........"...... $ readelf -h /bin/ls ELF Header: Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 Class: ELF64 Data: 2's complement, little endian Type: DYN (Position-Independent Executable file) Machine: Advanced Micro Devices X86-64 Entry point address: 0x5eb0
Following the magic number, the ELF header encodes critical metadata: the target architecture (x86-64, ARM64, RISC-V, and so on), the endianness of the binary, the ELF type (executable, shared library, or relocatable object), the entry point address where execution should begin, and the location of the program header table and section header table within the file. The entry point value shown above reflects a position-independent executable and will differ across distributions and builds -- it is a relative virtual address resolved by the loader at runtime.
The program header table describes segments -- contiguous regions of the file that the kernel's ELF loader maps into the process's virtual address space. A typical executable has a LOAD segment for the read-only code and data, another LOAD segment for read-write data, and a PT_INTERP segment that names the dynamic linker (typically /lib64/ld-linux-x86-64.so.2) responsible for resolving shared library dependencies at runtime.
DWARF: Binary Debug Information
ELF files compiled with debugging enabled carry an additional layer of binary metadata: DWARF debug information, stored in sections like .debug_info, .debug_line, .debug_abbrev, and .debug_loc (DWARF 5, broadly supported since GCC 8 and Clang 7, modernized several of these names -- .debug_loc became .debug_loclists, for example -- so on current toolchains you may see the newer section names instead). DWARF is itself a precisely specified binary format that encodes the mapping between machine code addresses and source file lines, the types and locations of variables at any given instruction, and the call frame layout needed to unwind the stack. When GDB shows you a source line, a stack frame, or a variable value, it is parsing DWARF binary data to do so. Tools like perf, Valgrind, and crash dump analyzers depend on the same DWARF structures. Strip a binary's debug sections with strip --strip-debug and these tools lose their ability to give human-readable context -- they fall back to raw addresses. [DWARF Standards Committee; dwarfstd.org]
$ readelf -S /bin/ls | grep -E '\.(text|data|bss|rodata)' [16] .text PROGBITS ... AX # Compiled machine code (exec) [17] .rodata PROGBITS ... A # Read-only string constants [25] .data PROGBITS ... WA # Initialized global variables [26] .bss NOBITS ... WA # Uninitialized globals (zero-filled) $ objdump -d /bin/ls | head -20 # Disassembly of section .text: 0000000000005eb0 <_start>: 5eb0: 31 ed xor %ebp,%ebp 5eb2: 49 89 d1 mov %rdx,%r9 5eb5: 5e pop %rsi
Try It: Build a Minimal ELF Binary by Hand
The best way to internalize how ELF works is to build one yourself -- not with a compiler, but by writing the raw binary structures directly. The following NASM assembly produces a complete, statically linked ELF binary that calls write(1, "hello\n", 6) and exit(0) using raw system calls. When assembled with nasm -f elf64 and linked with ld -s, the result is roughly 4,700–4,800 bytes -- the linker adds its own metadata and alignment overhead even for a trivial program. That is still small enough to inspect meaningfully with readelf and objdump, and every byte in the ELF header region maps directly back to the specification. (Hand-crafting the ELF header yourself using nasm -f bin and bypassing ld entirely can produce binaries under 200 bytes, but the nasm + ld workflow below prioritizes readability and tool compatibility.)
; Assemble: nasm -f elf64 hello.asm && ld -o hello hello.o ; Or static: nasm -f elf64 hello.asm && ld -s -o hello hello.o ; Result is a ~4.7 KiB linked binary (strip with -s to shrink further) section .data msg: db "hello", 0x0a ; "hello\n" -- 6 bytes of binary data section .text global _start _start: ; syscall: write(1, msg, 6) mov rax, 1 ; syscall number 1 = write mov rdi, 1 ; file descriptor 1 = stdout lea rsi, [rel msg] ; pointer to our string mov rdx, 6 ; length = 6 bytes syscall ; invoke the kernel at the binary boundary ; syscall: exit(0) mov rax, 60 ; syscall number 60 = exit xor rdi, rdi ; exit code 0 syscall
$ nasm -f elf64 hello.asm && ld -s -o hello hello.o $ ./hello hello # Now inspect the binary you just built $ ls -la hello -rwxr-xr-x 1 user user 4776 ... hello $ xxd hello | head -6 00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000 .ELF............ 00000010: 0200 3e00 0100 0000 0010 4000 0000 0000 ..>.......@..... 00000020: 4000 0000 0000 0000 8010 0000 0000 0000 @............... # ^ You can map every byte here to the ELF header fields from readelf -h $ readelf -h hello # Compare the entry point, machine type, and class against the raw hex above $ objdump -d hello # You'll see your exact assembly instructions encoded as machine code bytes # The mov rax,1 becomes 48 c7 c0 01 00 00 00 -- the REX prefix + opcode + imm32
When you build a binary by hand and then xxd the result, the abstraction layers collapse. The ELF header is not a concept anymore -- it is bytes 00 through 3F in a file you wrote. The entry point is not a theory -- it is the address you can see at offset 0x18 in the header, pointing to the _start label that contains your syscall instructions. This exercise makes every other section of this article concrete.
Machine Code, Assembly, and the Binary Instruction Set
Machine code is binary directly executable by the processor. Each instruction in machine code is a sequence of bytes encoding an operation code (opcode) and, optionally, operands specifying the data or memory addresses the operation works on. The set of valid opcodes and their encodings is defined by the processor's instruction set architecture (ISA).
Linux runs on a wide range of ISAs, including x86-64 (the dominant desktop and server architecture), ARM64 (dominant in mobile and increasingly in servers), RISC-V (an open ISA gaining traction in embedded and research contexts), MIPS, and PowerPC. Each ISA has its own binary encoding. An x86-64 binary is meaningless bytes on an ARM64 processor because the opcode encodings are entirely different.
On x86-64, instructions are variable-length, ranging from one to fifteen bytes. The PUSH RAX instruction encodes as the single byte 0x50 -- no REX prefix is needed because PUSH defaults to 64-bit operand size in 64-bit mode, with the destination register encoded in the low three bits of the opcode. A MOV with a 64-bit immediate value (the movabs form) uses the B8+rd encoding: a REX prefix byte, an opcode byte with the destination register encoded in its low three bits, and eight bytes for the immediate -- ten bytes total. No ModRM byte is needed in this form because the register is embedded directly in the opcode; other MOV variants in the Intel SDM do use ModRM for register/memory addressing. [Intel SDM Vol. 2B]
Assembly language is a thin human-readable wrapper over machine code. Each assembly mnemonic corresponds to a specific machine code encoding. The GNU Assembler (GAS), part of binutils, translates assembly source files into object files containing machine code. Linux makes use of assembly in specific, critical places in the kernel where C would be insufficient -- system call entry and exit paths, context switching, early boot code, and highly optimized memory operations.
/* System call entry point for x86-64 Linux */ /* The SYSCALL instruction transfers control here from userspace */ SYM_CODE_START(entry_SYSCALL_64) swapgs /* Switch to kernel GS base */ movq %rsp, PER_CPU_VAR(cpu_tss_rw + TSS_sp2) SWITCH_TO_KERNEL_CR3 scratch_reg=%rsp movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp /* RAX holds the system call number */ /* RDI, RSI, RDX, R10, R8, R9 hold arguments */ call do_syscall_64 /* dispatch to C handler */
Try It: Patch a Compiled Binary with a Hex Editor
If binary is just bytes, you should be able to change those bytes and change the program's behavior. You can. This exercise takes a compiled C program, finds a string inside the binary using xxd, replaces it with different text of the same length, and runs the modified binary -- no recompilation needed.
# Step 1: Create a trivial C program and compile it $ echo '#include <stdio.h> int main() { puts("Hello, Linux!"); return 0; }' > patch_me.c $ gcc -o patch_me patch_me.c $ ./patch_me Hello, Linux! # Step 2: Find the string in the binary's .rodata section $ xxd patch_me | grep "Linux" 00002004: 4865 6c6c 6f2c 204c 696e 7578 2100 ... Hello, Linux!... # Step 3: Create a hex dump, find-and-replace the string, convert back # "Linux" (4c696e7578) -> "World" (576f726c64) -- same byte length $ xxd patch_me > patch_me.hex $ sed -i 's/4c696e7578/576f726c64/' patch_me.hex $ xxd -r patch_me.hex > patch_me_modified $ chmod +x patch_me_modified # Step 4: Run the patched binary $ ./patch_me_modified Hello, World! # The ELF structure, machine code, and all headers are unchanged. # Only the string data in .rodata was modified -- the binary runs identically # except for the five bytes you changed.
The replacement string must be exactly the same byte length as the original. Binary files have fixed layouts: if you insert or remove bytes, you shift every offset in the file, breaking the ELF headers, section addresses, and relocation entries. For variable-length string patching, you need to recalculate and rewrite ELF metadata -- which is what tools like patchelf and binary rewriters do. Additionally, the sed approach shown above assumes the target string is not split across a 16-byte xxd line boundary. For short, known strings in .rodata this is almost always safe, but for production patching use a purpose-built binary editor or script that works on the raw byte stream rather than xxd output.
The Binary System Call Interface
The system call interface is the binary boundary between userspace and the kernel. When a userspace program needs to request a service from the kernel -- reading a file, allocating memory, creating a process -- it executes a special processor instruction that transfers control to the kernel. On x86-64 Linux, this instruction is SYSCALL.
Before executing SYSCALL, the program places the system call number in the RAX register and the arguments in RDI, RSI, RDX, R10, R8, and R9, following the Linux system call calling convention. The system call number is a binary integer -- 0 is read, 1 is write, 60 is exit. The historical open syscall is number 2 on x86-64, but modern glibc and musl wrap most file-opening operations in openat (syscall 257) instead, which accepts a directory file descriptor as an additional argument and is more composable -- the strace output below reflects this. [linux/arch/x86/entry/syscalls/syscall_64.tbl]
The kernel developers have committed to never breaking the system call ABI for native binaries. As Greg Kroah-Hartman writes in the official kernel documentation -- a document titled stable-api-nonsense.rst, whose purpose is actually to argue against a stable internal kernel API for drivers, but which draws a sharp contrast with the userspace guarantee: "The kernel to userspace interface is the one that application programs use, the syscall interface. That interface is very stable over time, and will not break." A statically linked x86-64 binary that uses only the syscall interface and makes no assumptions about /proc, /sys, or specific signal semantics will still execute correctly on a Linux 6.x kernel today. This stability depends only on maintaining the byte-level contract -- no recompilation required. Binaries that depend on specific /proc layouts, ioctl arguments, or kernel-internal behavior exposed through pseudo-filesystems may still break across major kernel versions; only the core syscall numbers and calling conventions carry the stability guarantee. [kernel.org: stable-api-nonsense.rst]
# On modern systems, ls calls openat (syscall 257), not open (syscall 2) # The filter below uses openat explicitly to match actual kernel behavior $ strace -e trace=read,write,openat,close ls /tmp 2>&1 | head -10 openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0"..., 832) = 832 close(3) = 0 openat(AT_FDCWD, "/tmp", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3 # strace intercepts these at the binary register boundary via ptrace(2)
io_uring: A Binary Interface Beyond Syscalls
The SYSCALL instruction is not the only binary interface between userspace and the kernel. io_uring, introduced in Linux 5.1 (2019), provides a fundamentally different model: instead of trapping into the kernel for each I/O operation, userspace and the kernel share two ring buffers in memory -- a submission queue (SQ) and a completion queue (CQ). Userspace writes binary submission queue entries (SQEs) into the submission ring, each a 64-byte structure encoding the operation type, file descriptor, buffer address, offset, and flags. The kernel reads these entries, performs the operations, and writes binary completion queue entries (CQEs) into the completion ring. With SQPOLL mode enabled, a dedicated kernel thread polls the submission ring continuously, meaning high-throughput workloads can submit and reap thousands of operations with zero SYSCALL transitions in steady state. In the default interrupt-driven mode, a single io_uring_enter() call is still required to notify the kernel of new submissions -- but batching many operations per call still dramatically reduces per-operation syscall overhead compared to traditional I/O interfaces.
This matters for the binary perspective because io_uring replaces a register-level binary contract (the syscall ABI) with a memory-level binary contract (the SQE/CQE structure layouts). The binary format of these structures is part of the kernel's stable userspace ABI, just like syscall numbers -- changing the layout of an SQE would break every io_uring application. Tools like strace, which intercept the SYSCALL instruction via ptrace, cannot observe io_uring operations that bypass the syscall path entirely; tracing io_uring requires eBPF or the kernel's tracepoint infrastructure.
Binary File Formats: Beyond Executables
Linux deals with binary data far beyond executable programs. The file command performs binary format identification by reading the first bytes of a file -- known as magic bytes -- and comparing them against a database of known signatures. The database is typically stored at /usr/share/misc/magic or compiled into a binary cache at /usr/share/misc/magic.mgc; the exact path varies by distribution and file version. [file(1) man page]
Every major binary file format has its own structure. PNG images begin with an eight-byte signature (0x89 0x50 0x4E 0x47 0x0D 0x0A 0x1A 0x0A) followed by chunks with four-byte length fields, four-byte type identifiers, variable-length data, and four-byte CRC checksums. ZIP archives begin with a local file header signature (0x50 0x4B 0x03 0x04) followed by version requirements, compression method, and file metadata.
$ xxd image.png | head -2 00000000: 8950 4e47 0d0a 1a0a 0000 000d 4948 4452 .PNG........IHDR $ xxd archive.zip | head -2 00000000: 504b 0304 1400 0000 0800 ... PK.............. $ xxd database.sqlite | head -2 00000000: 5351 4c69 7465 2066 6f72 6d61 7420 3300 SQLite format 3. $ file /bin/ls image.png archive.zip database.sqlite /bin/ls: ELF 64-bit LSB pie executable, x86-64 image.png: PNG image data, 1920 x 1080, 8-bit/color RGBA archive.zip: Zip archive data, at least v2.0 to extract database.sqlite: SQLite 3.x database
Binary Data in the Linux Filesystem
Linux's "everything is a file" design philosophy means that binary data is accessible through the filesystem interface in ways that expose the underlying hardware and kernel state directly. As Linus Torvalds explained on the Linux Kernel Mailing List in 2002: "The whole point with 'everything is a file' is not that you have some random filename... but the fact that you can use common tools to operate on different things." [LKML, June 2002] The block devices in /dev expose raw binary storage. Reading from /dev/sda with dd or a custom program gives raw binary access to every byte on a storage device, including partition table structures, filesystem metadata, and raw file data.
The GPT (GUID Partition Table) format begins at LBA 1 (512 bytes into the disk) with a binary header containing a signature ("EFI PART" in ASCII), a revision number, header size, CRC32 checksum, and the binary GUID of the disk. Each partition entry is a 128-byte binary structure containing two GUIDs, start and end LBA values, attribute flags, and a UTF-16LE encoded partition name.
# dd if=/dev/sda bs=512 skip=1 count=1 2>/dev/null | xxd | head -4 00000000: 4546 4920 5041 5254 0000 0100 5c00 0000 EFI PART....\... 00000010: 3fc9 5d8e 0000 0000 0100 0000 0000 0000 ?.]............. 00000020: ffff af1d 0000 0000 2200 0000 0000 0000 ........"....... 00000030: deff af1d 0000 0000 5dbd 7a56 c36b 014b ........].zV.k.K # "EFI PART" signature visible at offset 0 -- this is a valid GPT header
Filesystem formats are entirely binary structures. The ext4 filesystem organizes its binary data into a superblock at byte offset 1024 containing the total block count, free block count, filesystem UUID, feature flags, and dozens of other fields in a precisely defined binary layout. When fsck.ext4 repairs a corrupted filesystem, it is directly manipulating these binary structures on disk.
Binary in the Kernel: Data Structures and Memory Layout
The Linux kernel is fundamentally a binary data manipulation engine. The kernel's data structures -- the task_struct representing a process, the inode representing a filesystem object, the sk_buff representing a network packet -- are C structures that compile to precisely defined binary layouts in memory. The kernel relies on these layouts being consistent and predictable.
Endianness is a binary-level concern the kernel handles explicitly. Multi-byte values can be stored most-significant-byte-first (big-endian, as in network byte order) or least-significant-byte-first (little-endian, as in x86). The kernel uses explicit type annotations like __be32 and __le32 for big-endian and little-endian 32-bit integers, and provides conversion macros to ensure correct byte order handling.
Network byte order is big-endian, while x86 is natively little-endian. This means every multi-byte field in a TCP/IP header must be byte-swapped on x86 systems. Missing a htons() or ntohl() call is a classic networking bug that only manifests on little-endian architectures.
# HTTP port 80 = 0x0050 in hex. Watch what happens to the bytes: $ python3 -c " import struct port = 80 # 0x0050 # Little-endian (x86 native byte order): low byte first le = struct.pack('<H', port) print(f'Port {port} little-endian: {le.hex()} -> bytes [{le[0]:#04x}, {le[1]:#04x}]') # Big-endian (network byte order): high byte first be = struct.pack('>H', port) print(f'Port {port} big-endian: {be.hex()} -> bytes [{be[0]:#04x}, {be[1]:#04x}]') # What happens if you forget htons() -- port 80 becomes port 20480 wrong = int.from_bytes(le, 'big') print(f'Forget byte-swap: the remote host reads port {wrong} instead of {port}') " Port 80 little-endian: 5000 -> bytes [0x50, 0x00] Port 80 big-endian: 0050 -> bytes [0x00, 0x50] Forget byte-swap: the remote host reads port 20480 instead of 80 # This is exactly the bug htons() prevents. The bytes are identical data, # just in opposite order -- and the network protocol expects big-endian.
Bit manipulation is pervasive in kernel code. A single 32-bit integer often encodes multiple independent boolean flags or small integer fields packed together. The kernel's file mode field (stored in the inode) encodes file type, setuid/setgid/sticky bits, and owner/group/other read/write/execute permissions in a sixteen-bit binary field. Working with this binary packed data requires bitwise AND for masking, OR for setting, XOR for toggling, and shift operations for extracting fields.
/* File type encoding in the mode field (top 4 bits) */ #define S_IFMT 0170000 /* mask for file type bits */ #define S_IFREG 0100000 /* regular file */ #define S_IFDIR 0040000 /* directory */ #define S_IFLNK 0120000 /* symbolic link */ /* Permission bits */ #define S_IRUSR 00400 /* owner read (bit 8) */ #define S_IWUSR 00200 /* owner write (bit 7) */ #define S_IXUSR 00100 /* owner execute (bit 6) */ /* Testing a mode field in C */ if ((inode->i_mode & S_IFMT) == S_IFREG) /* is it a regular file? */ if (inode->i_mode & S_IRUSR) /* does owner have read? */
Try It: Decode File Permissions in Binary by Hand
The octal permissions you see from ls -l or stat are a human convenience. What the kernel actually stores and checks is a binary bit field. Walking through the conversion by hand reveals exactly how the kernel decides whether to allow a read, write, or execute.
# Get the raw octal mode of /bin/ls $ stat -c '%a %n' /bin/ls 755 /bin/ls # 755 octal = 111 101 101 binary. Here's each digit expanded: # 7 = 111 (owner: read + write + execute) # 5 = 101 (group: read + execute, no write) # 5 = 101 (other: read + execute, no write) # # Full binary: 111 101 101 # rwx r-x r-x <- exactly what ls -l shows # Now simulate what the kernel does with bitwise AND: $ python3 -c " mode = 0o755 # the stored permission bits # The kernel tests each permission with a bitwise AND S_IRUSR = 0o400 # bit 8: owner read S_IWUSR = 0o200 # bit 7: owner write S_IXUSR = 0o100 # bit 6: owner execute S_IRGRP = 0o040 # bit 5: group read S_IWGRP = 0o020 # bit 4: group write S_IXGRP = 0o010 # bit 3: group execute S_IROTH = 0o004 # bit 2: other read S_IWOTH = 0o002 # bit 1: other write S_IXOTH = 0o001 # bit 0: other execute print(f'Mode {oct(mode)} = {bin(mode)} in binary') print(f'Owner read? {bool(mode & S_IRUSR)} ({bin(mode)} & {bin(S_IRUSR)} = {bin(mode & S_IRUSR)})') print(f'Group write? {bool(mode & S_IWGRP)} ({bin(mode)} & {bin(S_IWGRP)} = {bin(mode & S_IWGRP)})') print(f'Other execute? {bool(mode & S_IXOTH)} ({bin(mode)} & {bin(S_IXOTH)} = {bin(mode & S_IXOTH)})') " Mode 0o755 = 0b111101101 in binary Owner read? True (0b111101101 & 0b100000000 = 0b100000000) Group write? False (0b111101101 & 0b000010000 = 0b0) Other execute? True (0b111101101 & 0b000000001 = 0b1) # When the AND result is nonzero, the bit is set -> permission granted. # When it's zero, the bit is clear -> permission denied. # This is the actual logic in the kernel's inode_permission() path.
Binary Compatibility, Multilib, and Cross-Architecture Linux
Linux systems frequently need to run binaries compiled for different but related architectures. A 64-bit x86 Linux system can run 32-bit x86 binaries through the kernel's ia32 compatibility layer, which implements the 32-bit int 0x80 system call interface alongside the native 64-bit SYSCALL interface. The binary loader recognizes a 32-bit ELF binary by the EI_CLASS field in the ELF header (1 for 32-bit, 2 for 64-bit) and invokes the appropriate execution path.
Cross-compilation -- building binaries for one architecture on a different one -- is a staple of embedded Linux development. A developer building firmware for an ARM-based router from an x86-64 workstation uses a cross-compiler toolchain that produces ARM ELF binaries with ARM machine code. QEMU's user-mode emulation allows these cross-compiled binaries to run on the build host by intercepting system calls and translating them. The binfmt_misc mechanism in the Linux kernel supports this directly: registering QEMU's user-mode emulator as the handler for ARM ELF binaries allows the system to execute them transparently, as if they were native.
# Register ARM64 ELF binaries to run transparently via QEMU # echo ':qemu-aarch64:M::\x7fELF\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\xb7:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff:/usr/bin/qemu-aarch64-static:F' \ > /proc/sys/fs/binfmt_misc/register # Now ARM64 binaries execute transparently on x86-64 $ file ./arm64-binary ./arm64-binary: ELF 64-bit LSB executable, ARM aarch64 $ ./arm64-binary # runs via QEMU transparently
Binary Security: Hardening, Mitigations, and Exploitation
Binary security is one of the most active areas of Linux systems research, because vulnerabilities at the binary level -- buffer overflows, use-after-free bugs, integer overflows, format string bugs -- have historically enabled some of the most severe exploits. The Linux ecosystem has developed an extensive set of binary-level mitigations implemented across the compiler, linker, loader, and kernel.
Position Independent Executables (PIE) cause the executable to be loaded at a random base address by the kernel, complementing Address Space Layout Randomization (ASLR) which randomizes the addresses of the stack, heap, and shared libraries. The combined effect is that an attacker cannot predict the binary address of useful code sequences (ROP gadgets) needed to exploit memory corruption vulnerabilities. However, ASLR's protection is probabilistic, not absolute: on x86-64 Linux, mmap/library randomization entropy defaults to 28 bits (tunable via /proc/sys/vm/mmap_rnd_bits, with a kernel-built maximum typically of 32 bits), and stack entropy is lower still, meaning brute-force attacks remain feasible in some scenarios -- particularly against forking servers that do not re-randomize after each fork. ASLR raises the cost of exploitation significantly without eliminating it.
Stack canaries, inserted by GCC and Clang as binary instrumentation around function frames, place a random value between the local variables and the saved return address. Before a function returns, the canary value is checked, and if it has been modified -- as it would be by a stack buffer overflow -- the process is terminated. This check is pure binary manipulation: the compiler emits instructions to load the canary from a thread-local storage slot, push it onto the stack, and verify it before the RET instruction.
# checksec inspects ELF binary security properties $ checksec --file=/bin/ls RELRO STACK CANARY NX PIE RPATH RUNPATH Full RELRO Canary found NX enabled PIE enabled No RPATH No RUNPATH # Verify ASLR is enabled at kernel level $ cat /proc/sys/kernel/randomize_va_space 2 # 2 = full ASLR (stack, heap, mmap, vdso, PIE) # Inspect RELRO -- GOT is read-only after dynamic linking $ readelf -l /bin/ls | grep GNU_RELRO GNU_RELRO 0x... 0x... RW # The ELF segment flags show RW, but the loader calls mprotect() to enforce # read-only at runtime after dynamic linking completes (Full RELRO only)
RELRO (Relocation Read-Only) marks the GOT (Global Offset Table) -- the binary table of resolved library function addresses -- as read-only after the dynamic linker finishes processing, preventing an attacker who controls an arbitrary write primitive from overwriting function pointers to redirect execution.
The GOT works in tandem with the Procedure Linkage Table (PLT), a small stub section in the binary that acts as the first-call trampoline for each imported library function. On the first call to a shared library function, the PLT stub invokes the dynamic linker to resolve and write the actual function address into the GOT entry; subsequent calls go directly through the GOT. Partial RELRO protects only the .got section (non-function-pointer relocations), leaving .got.plt writable throughout the process lifetime. Full RELRO forces eager binding -- resolving all symbols at load time -- so the entire GOT including .got.plt can be made read-only immediately. This eliminates GOT overwrite attacks entirely at the cost of slightly longer startup time.
Classic PLT lazy binding -- where the dynamic linker resolves symbols on first call -- is increasingly uncommon on modern hardened systems. Major distributions including Fedora, Ubuntu, and Debian now pass -z now (equivalent to Full RELRO with eager binding) as a default linker flag in their build toolchains, resolving all symbols at load time. This is a toolchain policy decision, not a glibc runtime change -- binaries built without -z now or the LD_BIND_NOW environment variable will still use lazy binding. The lazy binding model remains relevant for understanding legacy binaries, custom builds, and non-hardened toolchains, but in production distribution packages, Full RELRO with eager binding is now the norm.
Binary-level mitigations (ASLR, RELRO, canaries) harden the execution environment, but the data those binaries transmit remains a separate concern. The "harvest now, decrypt later" attack model means adversaries may be collecting encrypted network traffic today to decrypt once quantum capability arrives. Post-quantum key exchange in TLS and SSH operates at the protocol layer rather than the binary hardening layer, but it is equally urgent for any data with a long-term sensitivity horizon.
Seccomp-BPF: Syscall Filtering at the Binary Interface
Seccomp-BPF (Secure Computing with Berkeley Packet Filter) is a kernel mechanism that attaches a BPF program directly to the system call path of a process, allowing fine-grained filtering of which system calls -- and with which argument values -- are permitted. Because the filter runs at the binary system call boundary, it provides enforcement that cannot be bypassed by userspace library tricks: a process that has installed a seccomp filter cannot call a forbidden syscall regardless of how the request is made. Chrome, Firefox, OpenSSH, and most container runtimes (Docker, containerd, Podman) use seccomp-BPF profiles to dramatically reduce the kernel attack surface exposed to untrusted code. Violating the filter results in the process receiving a SIGSYS signal or being killed immediately, depending on the policy action configured. [kernel.org: Seccomp BPF userspace API]
# Check whether a process has an active seccomp filter $ grep Seccomp /proc/$(pgrep chrome | head -1)/status Seccomp: 2 # 0=off 1=strict 2=filter (BPF) # Check whether Docker's default seccomp profile is active # Note: Docker applies its built-in profile by default even when SecurityOpt is empty $ docker info 2>/dev/null | grep -i seccomp seccomp Profile: builtin # Verify from inside a container: the init process will show filter mode $ docker run --rm alpine grep Seccomp /proc/1/status Seccomp: 2 # Dump the BPF filter bytecode attached to a process (requires root) # bpftool prog list | grep seccomp
Try It: Build a Seccomp Filter That Blocks a Syscall
Seeing that a process has a seccomp filter is one thing. Building one yourself and watching it enforce in real time makes the binary enforcement model tangible. The following C program installs a filter that blocks the mkdir system call (number 83 on x86-64). Any attempt to create a directory after the filter is installed results in immediate process termination.
/* Compile: gcc -o seccomp_demo seccomp_demo.c */ #include <stdio.h> #include <unistd.h> #include <sys/stat.h> #include <sys/prctl.h> #include <linux/seccomp.h> #include <linux/filter.h> #include <linux/audit.h> #include <stddef.h> int main() { /* BPF filter: if syscall == mkdir (83), kill the process */ struct sock_filter filter[] = { /* Load the syscall number from seccomp_data.nr */ BPF_STMT(BPF_LD | BPF_W | BPF_ABS, offsetof(struct seccomp_data, nr)), /* If syscall == 83 (mkdir), go to KILL */ BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, 83, 0, 1), /* KILL: terminate the process immediately */ BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_KILL), /* ALLOW: permit the syscall */ BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW), }; struct sock_fprog prog = { .len = (unsigned short)(sizeof(filter) / sizeof(filter[0])), .filter = filter, }; /* Required: allow ourselves to install a filter without root */ prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); /* Install the BPF filter on the syscall path */ prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog); printf("Filter installed. mkdir is now blocked.\n"); printf("Attempting mkdir... "); fflush(stdout); /* This syscall hits the filter and kills the process */ mkdir("/tmp/seccomp_test", 0755); /* This line never executes */ printf("This will never print.\n"); return 0; }
$ gcc -o seccomp_demo seccomp_demo.c $ ./seccomp_demo Filter installed. mkdir is now blocked. Attempting mkdir... Bad system call (core dumped) # The kernel killed the process at the binary syscall boundary. # No userspace library trick, LD_PRELOAD shim, or clever wrapper # can bypass this -- the filter runs inside the kernel before the # syscall handler is ever reached. # Verify the signal was SIGSYS (signal 31): $ echo $? 159 # 128 + 31 = SIGSYS
Binary Analysis Tools on Linux
Linux provides an exceptionally rich set of tools for binary analysis, many originating in the GNU project and supplemented by modern open-source alternatives.
The GNU binutils package is the core binary analysis toolkit: objdump provides disassembly, section headers, relocation entries, and symbol tables; nm lists symbols in an object file; strip removes symbol tables and debug information; ar creates and manages static library archives. The readelf tool provides detailed inspection of all ELF structures in a more structured format than objdump. [GNU binutils documentation]
Beyond the classical GNU toolchain, the extended Berkeley Packet Filter (eBPF) subsystem has become a first-class binary tracing and analysis framework. Tools like bpftrace and bpftool allow attaching programs directly to kernel and userspace events, inspecting binary data at function entry and exit points, and profiling system behavior -- all without modifying or recompiling the target binary. eBPF programs are themselves compiled to a verified bytecode format that the kernel JIT-compiles to native machine code at load time, making eBPF itself a binary format with its own execution model inside the kernel. [kernel.org: eBPF documentation]
# Trace every openat syscall and print which process is opening which file # bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%-16s %s\n", comm, str(args.filename)); }' Attaching 1 probe... bash /etc/passwd ls /etc/ld.so.cache ls /lib/x86_64-linux-gnu/libselinux.so.1 cat /home/user/notes.txt # Count syscalls by name for a specific command, then print the histogram # bpftrace -e 'tracepoint:raw_syscalls:sys_enter /comm == "ls"/ { @[ksym(*(kaddr("sys_call_table") + args.id * 8))] = count(); }' -c "ls /" # The above is itself an eBPF program -- bpftrace compiles it to eBPF bytecode, # the kernel verifier checks it for safety, and the JIT compiles it to native x86-64
# Disassemble a function of interest $ objdump -d --disassemble=main ./mybinary # List all exported and undefined symbols $ nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep " T " | head # Show all shared library dependencies $ ldd /bin/ls linux-vdso.so.1 (0x00007ffcb45f1000) libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 # Trace dynamic linker resolution at runtime $ LD_DEBUG=bindings ls 2>&1 | head -5 # Find ROP gadgets for security research $ ROPgadget --binary /bin/ls --rop | grep "pop rdi"
The vDSO: A Kernel-Supplied ELF Binary
The linux-vdso.so.1 entry in ldd output is notable: it has no path on disk because it does not exist as a file. The vDSO (virtual dynamic shared object) is a small ELF shared library that the kernel maps directly into the virtual address space of every process at startup. Its purpose is to accelerate certain system calls that are extremely frequent but require only kernel-readable state -- primarily gettimeofday, clock_gettime, time, and getcpu. A full SYSCALL instruction involves a privilege level transition from ring 3 to ring 0 and back, costing hundreds of nanoseconds. The vDSO implementation of clock_gettime, by contrast, reads from a kernel-maintained memory page (the vvar region) mapped read-only into userspace -- no privilege transition required. One important nuance: the vDSO implementation falls back to a real system call for any clock ID whose data cannot be served from the vvar page. On current kernels, only CLOCK_REALTIME, CLOCK_MONOTONIC, CLOCK_REALTIME_COARSE, and CLOCK_MONOTONIC_COARSE are served without a fallback; clocks like CLOCK_BOOTTIME, CLOCK_PROCESS_CPUTIME_ID, and others still trigger a real syscall. The optimization is therefore conditional on the requested clock source, not universal. The vDSO is itself a proper ELF binary with its own .text, .rodata, and symbol table, which you can examine by dumping it from a running process: [vdso(7) man page; LWN "Implementing virtual system calls"]
# Find the vDSO address range in a running process $ grep vdso /proc/self/maps 7fff8b3f2000-7fff8b3f4000 r-xp 00000000 00:00 0 [vdso] # Dump the vDSO ELF to disk using Python (more reliable than dd for large offsets) # vDSO is typically ~8 KiB on x86-64 but size varies; read 16 KiB to be safe $ python3 -c " import re, sys maps = open('/proc/self/maps').read() m = re.search(r'([0-9a-f]+)-([0-9a-f]+) .* \[vdso\]', maps) start = int(m.group(1), 16) size = int(m.group(2), 16) - start mem = open('/proc/self/mem', 'rb') mem.seek(start) open('/tmp/vdso.so', 'wb').write(mem.read(size)) " $ nm /tmp/vdso.so 0000000000000a10 T clock_gettime 0000000000000a80 T clock_getres 0000000000000dd0 T gettimeofday 0000000000000d90 T time 0000000000000df0 T getcpu
GDB, the GNU Debugger, operates entirely at the binary level: it uses the ptrace system call to control a target process, reading and writing its binary register state and memory, setting breakpoints by replacing instruction bytes with the x86 INT3 instruction (0xCC), and single-stepping through machine code instructions. For data watchpoints and execution breakpoints that must not modify the instruction stream, GDB also uses the processor's hardware debug registers (DR0--DR3 on x86-64), which trigger a debug exception when a specified address is read, written, or executed -- without altering a single byte of the binary. Valgrind instruments binary code at runtime by translating it to an internal representation, inserting instrumentation, and re-emitting native binary -- enabling memory error detection without requiring recompilation.
perf is the Linux kernel's primary binary-level performance profiling tool. It uses hardware performance counters and kernel tracepoints to sample the instruction pointer of running processes at precise intervals, producing a statistical profile of where CPU time is spent at the machine code level. perf record captures binary sample data; perf report maps those samples back through symbol tables and DWARF debug info to annotated source or assembly. ftrace is the kernel's built-in function tracing infrastructure, accessible through the tracefs filesystem at /sys/kernel/tracing. It can trace every kernel function call, measure latency between any two points, and record the binary call graph through the kernel -- all without a loaded module or external tool.
# Profile a binary at the instruction level $ perf record -g ./mybinary && perf report # Annotate hot instructions with cycle counts $ perf annotate --stdio -s my_hot_function # Trace all calls to a kernel function via ftrace # echo 'vfs_read' > /sys/kernel/tracing/set_ftrace_filter # echo 'function' > /sys/kernel/tracing/current_tracer # cat /sys/kernel/tracing/trace | head -20
Binary Formats in the Boot Process
The boot process on a Linux system is a cascade of binary format interpretation. The UEFI firmware reads the GPT partition table, identifies the EFI System Partition, locates a PE/COFF format binary (the bootloader, such as GRUB or systemd-boot), loads it into memory at a defined address, and transfers execution. PE/COFF is Windows' native executable format, adopted by UEFI because the specification was developed by Intel with significant Microsoft involvement and PE/COFF was already well-specified for 64-bit platforms at the time.
The Linux kernel image on x86 is a compressed binary in a format called bzImage, consisting of a real-mode setup header, a protected-mode bootstrap stub, and a compressed payload containing the actual kernel binary. The kernel build system supports multiple compressors: gzip, bzip2, lzma, xz, lzo, lz4, and zstd. The compression algorithm in use is recorded in the setup header flags and the magic bytes of the payload. GRUB reads these binary fields to understand how to load and boot the kernel correctly.
# The kernel boot protocol magic is at offset 0x202 in bzImage $ xxd /boot/vmlinuz-$(uname -r) | grep -A2 "HdrS" 00000200: 4865 6472 5300 0000 ... HdrS.... # "HdrS" = Header Signature confirming valid Linux boot protocol # The initramfs is a CPIO archive -- another binary format $ file /boot/initrd.img-$(uname -r) /boot/initrd.img: gzip compressed data $ zcat /boot/initrd.img-$(uname -r) | cpio -t 2>/dev/null | head -5 . bin bin/sh etc lib
Conclusion: Binary as Foundation, Not Detail
The relationship between binary and Linux is not a matter of low-level implementation detail that can be safely ignored by anyone serious about understanding the system. Binary is the common substrate on which every abstraction Linux provides rests. The process model, the filesystem, the network stack, the security architecture, the build toolchain -- all of them ultimately operate by reading, writing, and manipulating binary data according to precisely defined binary formats and protocols.
For system administrators, understanding binary fundamentals demystifies behavior that otherwise appears arbitrary: why files from one architecture cannot run on another, why a corrupted few bytes in a filesystem superblock can make a partition unmountable, why ASLR makes exploitation harder without making it impossible, why a process with a seccomp-BPF filter cannot call a forbidden syscall even through a compromised library. For developers, understanding how source code becomes binary -- the compilation, linking, loading, and execution pipeline, including the roles of DWARF debug information, the PLT, and the vDSO -- enables informed decisions about performance, security, and compatibility. For security researchers, binary is the ground truth: whatever abstraction layer a vulnerability is described at, the exploit operates at the binary level. And as the kernel's binary interfaces evolve -- io_uring replacing the traditional syscall path for high-throughput I/O, eBPF introducing a verified bytecode execution model inside the kernel -- the importance of understanding binary contracts only deepens.
The exercises in this article -- building an ELF by hand, patching a compiled binary, decomposing file permissions into bitwise operations, constructing a seccomp filter, watching endianness corrupt a port number -- exist because reading about binary is fundamentally different from working with it. The moment you xxd a file you assembled yourself and map every byte to the ELF specification, the abstraction layers collapse. Binary stops being a concept and becomes something you can see, manipulate, and reason about. That shift in perspective is what separates someone who uses Linux from someone who understands it.
Linux exposes its binary foundations more openly than almost any other production operating system. The tools to inspect, analyze, and manipulate binary data at every layer of the stack are available, open-source, and well-documented. This transparency is not incidental -- it reflects the same philosophy that makes Linux's source code open: the conviction that understanding the system fully, down to its lowest levels, makes you better equipped to use it, secure it, and build on it.
Sources & Further Reading
- ELF Specification — TIS Committee. Tool Interface Standard (TIS) Executable and Linking Format (ELF) Specification, Version 1.2. May 1995. refspecs.linuxfoundation.org/elf/elf.pdf
- 86open Final Update — Leibovitch, Evan. The 86open Project: Final Update. July 25, 1999. linuxtoday.com
- x86-64 5-Level Paging — Linux Kernel Documentation.
Documentation/arch/x86/x86_64/5level-paging.rst. docs.kernel.org/arch/x86/x86_64/5level-paging.html - Linux Syscall Table (x86-64) — Derived from the Linux kernel source:
arch/x86/entry/syscalls/syscall_64.tbl. github.com/torvalds/linux - Kernel ABI Stability Policy — Kroah-Hartman, Greg. The Linux Kernel Driver Interface.
Documentation/process/stable-api-nonsense.rst. docs.kernel.org - vDSO Man Page —
vdso(7). Linux Programmer's Manual. man7.org/linux/man-pages/man7/vdso.7.html - Implementing Virtual System Calls (vDSO) — Edge, Jake. LWN.net. 2014. lwn.net/Articles/615809/
- Seccomp BPF Kernel Documentation —
Documentation/userspace-api/seccomp_filter.rst. kernel.org/doc/html/latest/userspace-api/seccomp_filter.html - Intel 64 and IA-32 Architectures Software Developer's Manual, Vol. 2B — MOV instruction encoding, B8+rd form. Intel Corporation. intel.com (SDM)
- x86-64 Instruction Encoding Reference — OSDev Wiki. wiki.osdev.org/X86-64_Instruction_Encoding
- "Everything is a file" — Linus Torvalds — Linux Kernel Mailing List, June 2002. yarchive.net/comp/linux/everything_is_file.html
- Linux System Call Interface — Packagecloud Blog. The Definitive Guide to Linux System Calls. blog.packagecloud.io
- openat(2) Man Page — Linux Programmer's Manual. man7.org/linux/man-pages/man2/openat.2.html
- Linux Kernel eBPF Documentation —
Documentation/bpf/. docs.kernel.org/bpf/index.html - GNU Binutils Documentation — sourceware.org/binutils/docs/
- DWARF Debugging Standard — DWARF Standards Committee. dwarfstd.org
- file(1) Man Page —
fileutility documentation. man7.org/linux/man-pages/man1/file.1.html - io_uring Kernel Documentation —
Documentation/io_uring/. docs.kernel.org - bpftrace Reference Guide — github.com/bpftrace/bpftrace