The Process Model Underneath Everything

Before touching a single line of script syntax, you need to internalize one fact about Unix and Linux: almost every command you run spawns a new process. The shell executes via the fork() and exec() system calls -- fork() copies the current process into a child, and exec() replaces that child's image with a new program. This has profound implications for your scripts.

A child process inherits a copy of the parent's environment, file descriptors, and variables -- but that inheritance is one-directional. Any change a child makes to its own memory is invisible to the parent. This is not a design limitation; it is fundamental to how the Linux process model works.

Key Concept

Every process on a Unix system has its own parcel of memory for holding its variables, file descriptors, and a copy of the environment inherited from its parent. Changes to variables in one process do not affect any other process on the system. This single insight explains an enormous number of Bash script bugs.

When you pipe output into a while read loop, for example, Bash runs the entire loop body in a subshell. The loop does its work, increments counters, sets variables -- and then the subshell exits, and every change evaporates. The parent script sees nothing. This is not a bug; it is the expected behavior of the process model.

In Bash, all elements in a pipeline run in a subshell -- whereas in Ksh and Zsh, all except the last run in a subshell. As Vidar Holen explains on his blog, this means that a construct like echo "2 + 3" | bc | read sum works in Ksh and Zsh but silently fails in Bash. POSIX leaves the behavior undefined. This matters enormously for anyone writing scripts intended to be portable, or for anyone debugging a script that "should work" but silently fails to update a global counter.

Bash 4.2 introduced the lastpipe shell option (shopt -s lastpipe) which causes the last element of a pipeline to run in the current shell rather than a subshell, matching the Ksh/Zsh behavior. However, lastpipe only takes effect when job control is disabled (which it is in scripts but not in interactive shells). Many production environments still run Bash versions older than 4.2, and relying on lastpipe makes your script dependent on a non-default shell option. Process substitution remains the more portable and explicit fix.

Understanding this model changes how you write everything. It is the reason process substitution exists. It is the reason source exists. It is the reason export exists. Start here, and the rest of Bash begins to make coherent sense.

This model also has direct security implications. The Shellshock vulnerability (CVE-2014-6271) exploited the fact that Bash processed trailing commands hidden inside exported function definitions in environment variables -- meaning a malicious environment variable passed across a privilege boundary (such as an HTTP header reaching a CGI script) could execute arbitrary code. In January 2026, a similar class of vulnerability surfaced in Ivanti EPMM (CVE-2026-1281, CVSS 9.8), where attacker-controlled strings passed to a Bash script via Apache RewriteMap were exploited through bash arithmetic expansion to achieve pre-authentication remote code execution. Both attacks trace directly to the same root cause: untrusted data crossing a process boundary into a Bash execution context.

Variables: What They Are and What They Are Not

Assignment and Scope

Bash variables are untyped strings in the shell's internal symbol table. When you write:

bash
filename="backup_2026.tar.gz"

Bash is not allocating typed memory the way C or Python does. It stores a name-value pair in a hash table maintained by the shell process. No type is enforced. The integer 42, the string "hello", and the path /etc/passwd are all stored identically. This is liberating and dangerous in equal measure.

Variable names follow a strict rule: they can contain letters, numbers, and underscores, and they cannot start with a number. They are also case-sensitive -- $User, $user, and $USER are three entirely different variables. $USER is reserved by the environment (it holds your login name), and overwriting it is a fast path to confusing behavior.

Caution

The assignment syntax is unforgiving: there must be no spaces around the = sign. Writing filename = "backup.tar.gz" does not assign a variable -- Bash interprets filename as a command, = as its first argument, and "backup.tar.gz" as its second. You will see a "command not found" error that, without this context, looks completely inexplicable.

Environment Variables vs. Shell Variables

These two concepts are often conflated and the conflation causes real problems. A shell variable exists only in the current shell's memory. An environment variable is part of the environment block that gets copied to child processes.

bash
# Shell variable -- only this shell can see it
log_level="DEBUG"

# Environment variable -- exported to child processes
export DATABASE_HOST="prod-db-01"

When you run a script, a child process is spawned. That child inherits all exported environment variables. It does not inherit un-exported shell variables. This is why scripts fail to find values set in your .bashrc -- those values were never exported, and the script runs in a child process.

You can declare and export in one step. Export is a property of the variable, not a one-time operation -- once a variable is marked for export, it stays exported through the life of that shell.

Parameter Expansion: The Depth Beneath Dollar Signs

Many people learn that $variable substitutes a value. The reality runs much deeper. Bash's parameter expansion syntax provides a miniature programming language for string manipulation, default values, error checking, and substring operations -- all without spawning an external process.

Default values prevent the classic "unbound variable" failure:

bash
# If $1 is unset or empty, use "production" as the default
environment="${1:-production}"

The :- operator returns the default if the variable is unset or empty. The - variant (without colon) returns the default only if unset, not if empty. This is a subtle but critical distinction when empty strings are meaningful values in your script.

Assign a default and set the variable simultaneously:

bash
# Sets log_dir AND uses it
log_dir="${LOG_DIR:=/tmp/logs}"

Error on unset variable -- fail loudly rather than silently:

bash
# Script aborts with the message if DEPLOY_KEY is unset
deploy_key="${DEPLOY_KEY:?DEPLOY_KEY must be set in the environment}"

This is far more expressive than checking manually and is processed inline during expansion.

String length and substring extraction:

bash
path="/var/log/application/service.log"
echo "${#path}"   # Outputs: 31

filename="server_backup_20260301.tar.gz"
# Extract from position 14, length 8 (the date portion)
date_str="${filename:14:8}"   # 20260301

Pattern removal -- essential for working with file paths without spawning external processes:

bash
filepath="/var/log/nginx/access.log"

# Remove everything up to and including the last slash (get filename)
filename="${filepath##*/}"     # access.log

# Remove shortest match from the end (remove extension)
basename="${filename%.*}"      # access

# Remove longest match from the end (strip all extensions)
stem="${filename%%.*}"         # access (same here, but differs with .tar.gz)

These operations run entirely within the Bash interpreter. Compare this to calling basename or dirname -- those are external processes, each requiring a fork() and exec() system call. In a loop that processes thousands of files, that difference is measurable.

Quoting: The First Defense Against Word Splitting

Unquoted variable expansions in Bash undergo two transformations: word splitting (splitting on the characters in $IFS, which defaults to space, tab, and newline) and globbing (expanding *, ?, and [...] patterns against the filesystem). Both behaviors are correct and consistent -- they are just almost never what you want when working with variables.

bash
# Dangerous -- if filename contains spaces, this becomes multiple arguments
rm $filename

# Safe -- the entire value is one argument
rm "$filename"
Rule of Thumb

Quote every variable expansion unless you specifically need word splitting or globbing. This is not pedantry. Files with spaces in their names are common. Configuration values with spaces are common. Any script that will be deployed on real systems needs to handle them.

Quoting is also your first line of defense against command injection. If a variable contains user-supplied input -- a filename from a web form, a hostname from a configuration file, an argument passed to the script -- and that variable is expanded without quotes, an attacker can craft input that breaks out of the intended argument context. A value like foo; rm -rf / passed to an unquoted $input in a command like process $input results in two commands being executed. Quoting alone does not make eval safe (never pass untrusted data to eval), but it prevents the most common class of injection that occurs through word splitting and glob expansion.

The one exception worth noting: inside [[ ... ]] (the extended test compound), the right-hand side of pattern matching operators intentionally undergoes globbing, so you may intentionally leave those unquoted. Inside [ ... ] (the POSIX test builtin), everything should still be quoted.

Loops: Mechanics, Pitfalls, and the Subshell Problem

The Three Fundamental Loop Constructs

Bash provides three loop types, and each has a domain where it is the natural choice.

The for loop iterates over a list:

bash
# Iterate over files -- note the quoting
for logfile in /var/log/*.log; do
    echo "Processing: $logfile"
    gzip --best "$logfile"
done

When globbing is involved, Bash expands the pattern before the loop starts. If no files match, the pattern is passed literally as a string (unless nullglob is set with shopt -s nullglob, which causes the list to be empty instead). This is a frequent source of bugs: a loop that "should not run" runs once with a literal *.log string.

The C-style for loop handles numeric iteration cleanly:

bash
# Retry logic with a counter
max_retries=5
for (( attempt=1; attempt<=max_retries; attempt++ )); do
    echo "Attempt $attempt of $max_retries"
    if deploy_service; then
        echo "Deployment succeeded"
        break
    fi
    sleep $((attempt * 2))   # Linear backoff
done

The while loop runs as long as a condition remains true:

bash
# Process a queue file
while IFS= read -r task; do
    echo "Processing task: $task"
    execute_task "$task"
done < /var/spool/my_queue.txt

Note the IFS= read -r pattern carefully. Setting IFS to empty prevents leading and trailing whitespace from being stripped from each line. The -r flag prevents backslash sequences from being interpreted. These two flags together give you the raw line exactly as stored in the file. Omitting either one is a silent correctness bug waiting to surface on data that contains unusual characters.

The Pipeline Subshell Trap

This is one of the most reliably surprising behaviors in Bash, and it has bitten every Bash programmer at least once. Consider this innocent-looking code:

bash -- broken example
count=0
cat /var/log/auth.log | while IFS= read -r line; do
    if [[ "$line" == *"Failed password"* ]]; then
        (( count++ ))
    fi
done
echo "Failed login attempts: $count"   # Always prints 0

The echo always prints 0. The pipe creates a subshell to run the while loop. Inside that subshell, count gets incremented correctly. But when the pipeline finishes, the subshell exits and takes its copy of count with it. The parent shell's count was never touched.

Fix: Process Substitution

The Bash Hackers Wiki documents a clean solution: use process substitution with < <(command) syntax to avoid the pipe operator entirely. This keeps the while loop running in the current shell context rather than a subshell.

The corrected version:

bash -- fixed with process substitution
count=0
while IFS= read -r line; do
    if [[ "$line" == *"Failed password"* ]]; then
        (( count++ ))
    fi
done < <(grep "Failed password" /var/log/auth.log)
echo "Failed login attempts: $count"   # Now correct

The < <(command) syntax -- note the space between the two < characters -- redirects standard input from a process substitution. The while loop runs in the current shell, not in a subshell, so count is updated in the parent context.

Process substitution works by having Bash create either a FIFO under $TMPDIR or a named file descriptor under /dev/fd/* depending on the operating system, then running the command in the background while substituting the filename. It is a non-POSIX extension supported by Bash, Zsh, and ksh88/93, but not by strictly POSIX-compliant shells like dash.

Iterating Arrays Correctly

Arrays in Bash deserve separate attention because the syntax is unusual and the quoting rules are critical.

bash
# Declare an indexed array
servers=("web-01" "web-02" "db-01" "cache-01")

# Declare an associative array (requires Bash 4+)
declare -A config
config[environment]="production"
config[log_level]="INFO"
config[max_connections]="500"

# Iterate indexed array -- always use "${array[@]}"
for server in "${servers[@]}"; do
    echo "Pinging $server"
    ping -c 1 "$server" > /dev/null 2>&1 && echo "$server is up" || echo "$server is DOWN"
done

# Iterate associative array
for key in "${!config[@]}"; do
    echo "$key = ${config[$key]}"
done

The difference between "${array[@]}" and "${array[*]}" is critical: [@] expands each element as a separate quoted word, preserving elements that contain spaces. [*] joins all elements into a single string using the first character of $IFS as a delimiter. Almost always, [@] is what you want.

Predict the Output
x="global"
change_x() { x="changed"; }
lose_x()   { echo "hello" | change_x; }

lose_x
echo "$x"
Output global The pipe runs change_x in a subshell. The assignment x="changed" happens inside that subshell's copy of memory and is discarded when the pipeline ends. The parent shell's x is never touched. This is the same fork() inheritance principle from the process model section -- now applied inside a function call you might not even realize creates a process boundary.

Notice how every topic in this article connects back to the same root principle: the process model. The subshell trap in loops is the same mechanism that causes variable loss in command substitution, which is the same mechanism that makes export necessary, which is the same isolation boundary that Shellshock and CVE-2026-1281 exploited. Once you see that single thread running through variables, loops, pipes, functions, and security, Bash stops feeling like a collection of arbitrary rules and starts behaving like a coherent system with one governing idea.

Functions: Modularity, Scope, and Return Values

Anatomy of a Bash Function

Functions in Bash are named code blocks stored in the shell's function table. Unlike compiled languages, they are not compiled or even syntax-checked until they are called. A syntactically invalid function definition will be stored without complaint and only fail when invoked.

bash
# Both definition styles are valid
greet() {
    echo "Hello, ${1:-world}"
}

function greet {
    echo "Hello, ${1:-world}"
}

The function keyword is Bash-specific. The name() syntax is POSIX-compatible. Use the latter if you care about portability.

Variable Scope and the local Keyword

By default, every variable in a Bash function is global. This is a significant design difference from many languages and a frequent source of subtle bugs.

bash -- without local (buggy)
# Bug: the loop counter "i" pollutes the global namespace
process_files() {
    for i in "$@"; do
        echo "Processing $i"
    done
}

backup_files() {
    for i in "$@"; do    # $i here might be clobbered by process_files
        cp "$i" /backup/
    done
}

The local keyword restricts a variable to the function's scope:

bash -- with local (correct)
process_files() {
    local i                    # Declare local before use
    local -i file_count=0      # Local integer

    for i in "$@"; do
        echo "Processing $i"
        (( file_count++ ))
    done

    echo "Processed $file_count files"
}

Local variables are implemented as a scope stack. When a function is called, Bash pushes a new scope frame. local declarations are stored in that frame. When the function returns, the frame is popped and local variables are discarded. The global variable under the same name (if any) re-emerges unchanged.

Hidden Pitfall: local Masks Exit Codes

Combining local and command substitution on the same line silently swallows errors. The exit status of local var=$(somecommand) is always 0 -- because the exit status comes from local itself (which succeeded), not from the command substitution inside it. Even with set -e enabled, a failing command inside this construct will not terminate the script. The fix is to always separate declaration and assignment: write local var on one line, then var=$(somecommand) on the next. The same trap applies to export, declare, and readonly when combined with command substitution.

Return Values and the Exit Status Convention

Bash functions do not return values the way functions do in Python or C. They return an exit status -- an integer from 0 to 255, where 0 means success and any non-zero value means failure. This is the same exit status convention used by every Unix command.

bash
# Return values through global or nameref variables
get_file_size() {
    local filepath="$1"
    local -n result_ref="$2"    # Nameref -- points to caller's variable

    if [[ ! -f "$filepath" ]]; then
        return 1    # Failure
    fi

    result_ref=$(stat -c %s "$filepath")
    return 0
}

# Usage
file_size=0
if get_file_size "/var/log/syslog" file_size; then
    echo "File size: $file_size bytes"
else
    echo "Error: could not get file size"
fi

The local -n nameref (available in Bash 4.3+) creates a reference to another variable, enabling functions to "return" values by modifying the caller's variable by name. This is cleaner than relying on globals or command substitution.

Command substitution -- capturing a function's stdout -- works for returning string values but spawns a subshell, which means any global modifications made inside the function are lost. This is the same subshell limitation discussed with pipelines.

Building a main() Architecture

One of the most transformative habits for Bash scripting is always defining a main() function and invoking it at the end of the script. This pattern, borrowed from C and Python, provides three major benefits: it makes the script sourceable (so other scripts can import your functions without executing the main logic), it provides a clear entry point, and it allows functions defined anywhere in the file to be called by main().

bash
#!/usr/bin/env bash
set -euo pipefail

# All functions defined before main()
log_info()  { printf '[INFO]  %s\n' "$*"; }
log_error() { printf '[ERROR] %s\n' "$*" >&2; }

parse_arguments() {
    local -n _args_ref="$1"
    shift
    while [[ $# -gt 0 ]]; do
        case "$1" in
            --env|-e) _args_ref[environment]="$2"; shift 2 ;;
            --dry-run) _args_ref[dry_run]=true; shift ;;
            *) log_error "Unknown option: $1"; return 1 ;;
        esac
    done
}

validate_environment() {
    [[ -v "${1}[environment]" ]] || {
        log_error "environment is required"
        return 1
    }
}

main() {
    declare -A args
    args[dry_run]=false

    parse_arguments args "$@"
    validate_environment args
    log_info "Running in ${args[environment]} mode"
}

main "$@"

The main "$@" at the end is critical. It passes all original script arguments into main(), preserving proper quoting and word boundaries. The "$@" expansion, unlike $@ or "$*", preserves each argument as a separate quoted word -- even if arguments contain spaces.

From a security perspective, the readonly declarations and the explicit set -euo pipefail at the top serve a hardening function beyond error handling. If an attacker achieves partial code injection into a running script (through an unsanitized environment variable, a sourced configuration file, or a malicious input), readonly prevents them from redefining critical paths or constants. Production scripts should also explicitly set PATH near the top -- for example, readonly PATH="/usr/local/bin:/usr/bin:/bin" -- to prevent PATH manipulation attacks where an attacker places a malicious binary earlier in the search order. Without this, a script that calls curl without an absolute path could be tricked into executing an attacker-supplied curl binary.

Error Handling: Building Scripts That Fail Honestly

The Strict Mode Triad

The single most impactful change you can make to a Bash script is adding this line at the top:

$ set -euo pipefail

This is three options combined. set -e (errexit) causes the script to exit immediately if any command returns a non-zero exit code, instead of blindly continuing. set -u (nounset) causes the script to exit with an error if any unset variable is expanded. This catches a huge class of bugs where a variable name is misspelled or never initialized. set -o pipefail changes the exit status of a pipeline from the status of the last command to the status of the rightmost command that failed. Without this, false | true returns exit status 0.

Together, these three options give you fail-fast behavior that makes scripts much safer by default. Be aware that set -e has nuanced behavior in certain contexts. The GNU Bash Reference Manual specifies that errexit does not trigger when the failing command is part of the test following if or elif, part of a && or || list (except the final command), any command in a pipeline except the last, or when the return value is inverted with !. The practical consequence: if you call a function inside an if condition, set -e is silently disabled for the entire call chain of that function, not just the immediate test. Greg Wooledge's BashFAQ documents this extensively, noting that the rules are "extremely convoluted" and change between Bash versions. Despite these edge cases, the overall improvement is significant enough that set -euo pipefail should be enabled in every production script -- but it is a safety net, not a substitute for explicit error checking in critical paths.

One specific set -e trap that catches even experienced scripters: the expression ((count++)) terminates the script when count is 0. This happens because the post-increment operator returns the value before incrementing -- so the expression evaluates to 0, and Bash maps an arithmetic result of 0 to exit code 1 (false), which triggers set -e. The fix is ((count++)) || true or (( ++count )) (pre-increment, which returns 1 when incrementing from 0).

Arithmetic Contexts Execute Arbitrary Code

Any value used in a Bash arithmetic context -- including $(( )), (( )), ${var:offset:length}, ${var[index]}, and even -eq comparisons inside [[ ]] -- is recursively evaluated as an arithmetic expression. If the value contains an array subscript with a command substitution, that command executes. For example, [[ "$input" -eq 42 ]] will run arbitrary code if $input contains a[$(whoami)]. This is exactly how CVE-2026-1281 was exploited in Ivanti EPMM. The defense is to validate input with a regex before any arithmetic use: [[ "$input" =~ ^-?[0-9]+$ ]] -- or use the single-bracket [ "$input" -eq 42 ] form, which invokes the external test command and does not perform arithmetic evaluation.

The trap Builtin: Signal Handling and Cleanup

The trap builtin allows you to register handler code that executes when the shell receives a signal or when certain events occur. The POSIX specification defines trap as a built-in that sets traps for signals like EXIT, SIGHUP, SIGINT, SIGTERM, and others.

The most important use case is the EXIT pseudo-signal, which fires whenever the shell exits for any reason -- including errors from set -e, normal completion, or signal termination:

bash
#!/usr/bin/env bash
set -euo pipefail

# Cleanup function -- runs on any exit
cleanup() {
    local exit_code=$?
    echo "Cleaning up temporary files..." >&2
    rm -f "$tmp_work_dir"/* 2>/dev/null || true
    rmdir "$tmp_work_dir" 2>/dev/null || true

    if [[ $exit_code -ne 0 ]]; then
        echo "Script failed with exit code: $exit_code" >&2
    fi
}

# Create temp directory and register cleanup
tmp_work_dir=$(mktemp -d)
trap cleanup EXIT

# The rest of the script -- cleanup runs even if this fails
process_data "$tmp_work_dir"

The || true idiom after cleanup commands is intentional: within the cleanup function, you usually want cleanup to proceed even if individual cleanup commands fail, so you suppress error propagation from them.

The use of mktemp -d in this pattern is a deliberate security measure, not a convenience. Writing temporary files to a predictable path like /tmp/myscript_$$ (using the process ID) creates a race condition known as a TOCTOU (time-of-check-to-time-of-use) vulnerability: an attacker who can predict the filename can create a symlink at that path before your script does, causing your script to write sensitive data to an attacker-controlled location or to overwrite a critical file. The mktemp utility generates filenames with cryptographically random components and creates them atomically with restrictive permissions (mode 0600 for files, 0700 for directories), eliminating both the prediction vector and the race window. The trap cleanup EXIT ensures these files are removed even on error, preventing information disclosure from orphaned temporary files in shared directories.

You can trap multiple signals:

bash
trap 'echo "Interrupted" >&2; exit 130' INT TERM
Note

SIGKILL (signal 9) cannot be trapped by any process -- it is handled directly by the kernel. Any cleanup architecture that depends on catching SIGKILL will fail.

Building a Structured Error Handling System

For production scripts, consider a logging and error handling infrastructure that captures context:

bash
#!/usr/bin/env bash
set -euo pipefail

readonly SCRIPT_NAME="$(basename "$0")"
readonly LOG_FILE="/var/log/${SCRIPT_NAME%.sh}.log"

# Logging with timestamp and level
log() {
    local level="$1"
    shift
    printf '%s [%s] [%s] %s\n' \
        "$(date -u '+%Y-%m-%dT%H:%M:%SZ')" \
        "$SCRIPT_NAME" \
        "$level" \
        "$*" | tee -a "$LOG_FILE"
}

log_info()  { log INFO  "$@"; }
log_warn()  { log WARN  "$@" >&2; }
log_error() { log ERROR "$@" >&2; }

# Error handler with context
handle_error() {
    local exit_code=$?
    local line_number="$1"
    log_error "Command failed at line $line_number with exit code $exit_code"
    log_error "Call stack:"
    local i=0
    while caller $i; do
        (( i++ ))
    done | while read -r line sub file; do
        log_error "  ${file}:${line} in ${sub}()"
    done >&2
}

# Trap ERR for detailed error reporting
trap 'handle_error $LINENO' ERR

The ERR trap fires whenever a command returns a non-zero exit status. The caller builtin provides the call stack. Combined, these give you production-quality error diagnostics in a shell script.

Exit Codes as a Communication Protocol

Every Bash script is a process, and every process communicates its outcome through exit codes. This is not just convention -- it is how pipelines, && and || operators, if statements, and set -e all work. Your scripts should return meaningful exit codes.

bash
readonly E_SUCCESS=0
readonly E_INVALID_ARGS=1
readonly E_MISSING_DEPENDENCY=2
readonly E_PERMISSION_DENIED=3
readonly E_NETWORK_ERROR=4
readonly E_UNKNOWN=99

check_dependencies() {
    local missing=()
    for cmd in curl jq openssl; do
        if ! command -v "$cmd" &>/dev/null; then
            missing+=("$cmd")
        fi
    done

    if [[ ${#missing[@]} -gt 0 ]]; then
        log_error "Missing required commands: ${missing[*]}"
        return $E_MISSING_DEPENDENCY
    fi
}

main() {
    check_dependencies || exit $?
    # ...
}

The command -v pattern for checking whether a command exists is preferred over which, because command -v is a shell builtin that works consistently across environments. The which command is an external binary with inconsistent behavior across distributions.

Performance: When Shell Builtins Beat External Commands

Every time your script calls an external command -- sed, awk, grep, cut, basename, dirname -- the kernel must fork a new process, load the program, execute it, and return the result. For a handful of calls, this is negligible. In a loop over thousands of items, it is measurable. A single fork() and exec() cycle on a modern Linux system takes roughly 1 to 5 milliseconds depending on system load. In a loop processing 10,000 files, replacing an external basename call with the builtin ${filepath##*/} can save 10 to 50 seconds of wall-clock time -- entirely from process creation overhead, not from the string operation itself.

Bash provides built-in alternatives for many common operations. String operations as discussed earlier (${var//old/new}, ${var#pattern}, ${var%pattern}) replace calls to sed and awk for simple substitutions. The [[ ]] regex matching (=~) replaces calls to grep for pattern testing. Arithmetic (( )) and $(( )) replace calls to expr and bc for integer math.

Where external tools are unavoidable -- and for complex data processing, awk or python are often the right tools -- invoke them once with all the data rather than inside a loop:

bash
# Slow: spawns grep once per file
for logfile in /var/log/*.log; do
    grep "ERROR" "$logfile" >> /tmp/all_errors.txt
done

# Fast: spawns grep once for all files
grep "ERROR" /var/log/*.log > /tmp/all_errors.txt

Three lesser-known builtins that eliminate common fork overhead:

printf -v assigns formatted output directly to a variable without a subshell. The common pattern var=$(printf '%04d' "$n") forks a subshell to capture stdout; printf -v var '%04d' "$n" does the same assignment entirely within the current shell. In Bash 4.2+, printf -v also supports the %(datefmt)T format specifier, which reads from the system clock without forking date at all: printf -v now '%(%Y-%m-%dT%H:%M:%S)T' -1 assigns the current timestamp to now in a single builtin call.

$EPOCHSECONDS and $EPOCHREALTIME (Bash 5.0+) provide the Unix epoch timestamp and a microsecond-precision timestamp as shell variables, replacing $(date +%s) entirely. If your script calls date +%s inside a loop for timing or log purposes, switching to $EPOCHSECONDS eliminates a fork on every iteration.

kill -0 PID checks whether a process exists without sending it a signal. The signal number 0 causes the kernel to perform the permission and existence check but skip the signal delivery. This is documented in the kill(2) system call man page but not in the Bash kill builtin documentation, so it surprises even experienced scripters. Use it in wait loops to test whether a background process is still running: while kill -0 "$pid" 2>/dev/null; do sleep 1; done.

Choose Your Approach
Scenario: You need to extract the filename from a full path inside a loop that processes 10,000 files. The variable $filepath holds /var/log/nginx/access.log. Which approach do you use?
A
filename="$(basename "$filepath")"
B
filename="$(echo "$filepath" | sed 's|.*/||')"
C
filename="${filepath##*/}"
Correct but Slow
Produces the right answer, but every call forks a subshell for the $() command substitution and then forks again to exec the external /usr/bin/basename binary. That is two process creations per iteration. Over 10,000 files at ~2ms per fork/exec cycle, this adds roughly 40 seconds of pure overhead. The result is also captured via a pipe, which means kernel buffer allocation on every call. Use basename in one-off scripts where clarity matters more than speed, but avoid it inside tight loops.
Correct but Wasteful
Three process creations for a single string operation: the $() subshell, the echo command (a builtin in Bash but still runs inside the subshell), and the external sed binary, all connected by a pipe. This is the slowest option -- roughly 50-60 seconds of overhead over 10,000 iterations. The sed regex engine loads, compiles the substitution pattern, processes a single short string, and exits. It is the right tool when you need complex multi-pattern transformations on streams of text, but using it to strip a path prefix from one variable is like starting a chainsaw to cut a piece of string.
Best: Zero Overhead
No processes are created. The ##*/ parameter expansion runs entirely inside the Bash interpreter's own memory. It strips the longest match of */ (everything up to and including the last slash) from the front of the string. Over 10,000 files, this completes in milliseconds total -- not seconds. The same principle applies to all parameter expansion operators: ${var%.*} to remove an extension, ${var//old/new} for substitution, ${#var} for string length. Whenever you can express the operation as a parameter expansion, you should -- because you are keeping the work inside the process that already has the data in memory.
Choose Your Approach
Scenario: Your script logs every action with a timestamp. The logging function is called hundreds of times per run. How do you get the current time?
A
now="$(date '+%Y-%m-%dT%H:%M:%S')"
B
printf -v now '%(%Y-%m-%dT%H:%M:%S)T' -1
C
now="$(python3 -c 'import datetime; print(datetime.datetime.now().isoformat())')"
Common but Forks
This is what you will find in the vast majority of existing Bash scripts. It works correctly, but every call forks a subshell and execs the external date binary. Called hundreds of times, the cumulative overhead adds up. In a script that logs on every loop iteration processing thousands of items, this single line can account for a substantial portion of total runtime. It also introduces a subtle correctness risk: if the call happens near midnight, two consecutive calls can return different dates.
Best: Zero Overhead
No process is created. The printf builtin with %(datefmt)T (Bash 4.2+) reads the system clock via the C library's strftime() and writes the result directly into the variable via -v. No subshell, no pipe, no external binary. The -1 argument means "current time." This is the fastest possible way to get a formatted timestamp in Bash. For epoch seconds specifically, Bash 5.0+ provides $EPOCHSECONDS as a shell variable -- no function call at all.
Extreme Overhead
This forks a subshell, execs the Python interpreter, waits for it to load, imports the datetime module, runs the function, prints to stdout, and pipes the result back. The Python interpreter startup alone is typically 30-50ms -- orders of magnitude slower than date (1-3ms) and incomparably slower than the builtin printf -v (microseconds). Python is a powerful tool for complex tasks, but calling it for a single timestamp inside a loop is a performance disaster. Use the right tool for the scale of the problem.

ShellCheck: The Tool Every Bash Author Needs

No discussion of efficient Bash scripting is complete without mentioning ShellCheck, a static analysis tool for shell scripts written in Haskell by Vidar Holen. It catches many of the pitfalls described in this article automatically -- unquoted variables, incorrect array iteration, subshell variable loss, unreachable code, and dozens of other common errors. ShellCheck works by parsing the script into an AST (abstract syntax tree) and applying over 400 individual analysis rules, each identified by a stable SC code (such as SC2086 for unquoted variables or SC2031 for variables modified in a subshell). These codes are stable across versions, which means you can suppress specific warnings with inline directives when you have a legitimate reason to deviate from the rule.

ShellCheck integrates with editors including VS Code, Vim, Emacs, and Sublime Text via plugins and can be incorporated into CI/CD pipelines using the shellcheck binary directly or through pre-commit hooks. Its error messages link to the ShellCheck wiki with detailed explanations and correct alternatives. For anyone serious about writing reliable Bash, it is not optional tooling -- it is part of the workflow.

Debugging: Tracing Execution at Runtime

ShellCheck catches problems before you run a script. When you need to understand what is happening during execution, Bash provides a built-in tracing mechanism: set -x (also known as set -o xtrace). When enabled, Bash prints every command to stderr before executing it, with variables expanded to their current values. You can enable it at the top of a script alongside strict mode, or toggle it on and off around a specific section:

bash
#!/usr/bin/env bash
set -euo pipefail

# Enable tracing for a specific section
set -x
process_files "$@"
set +x

# Or enable globally via the environment:
# TRACE=1 ./myscript.sh
[[ "${TRACE:-0}" == "1" ]] && set -x

The default trace prefix is +, but you can customize it by setting the PS4 variable. A common production pattern is to include the script name, line number, and function name in every trace line, which makes log output far more useful when debugging scripts that source other files or call deeply nested functions:

bash
export PS4='+${BASH_SOURCE[0]}:${LINENO}:${FUNCNAME[0]:+${FUNCNAME[0]}(): }'

In Bash 4.1 and later, you can redirect trace output to a separate file descriptor using BASH_XTRACEFD, keeping trace logs separate from your script's normal stderr output. This is particularly valuable in CI/CD pipelines where you want debug traces captured in a log file without contaminating the build output that users see.

Test Syntax: [ vs. [[ and When Each Applies

Bash provides two test syntaxes that look similar but behave differently in ways that matter for correctness and security. The single bracket [ ] is the POSIX-compatible test builtin -- it is a regular command, subject to word splitting and glob expansion on its arguments. The double bracket [[ ]] is a Bash keyword that is parsed by the shell before execution, which means it suppresses word splitting and glob expansion automatically. This distinction has practical consequences:

With [ ], an unquoted variable containing spaces breaks the test into too many arguments, producing a syntax error. With [[ ]], the same unquoted variable is handled correctly because the shell knows the entire construct is a conditional expression, not a command with arguments. Inside [[ ]], you can use =~ for regex matching, && and || for compound conditions, and pattern matching on the right-hand side of == without quoting the pattern. None of these work inside [ ].

The practical rule: use [[ ]] in Bash scripts for all conditional tests. Use [ ] only when writing scripts that must run under /bin/sh or other strictly POSIX-compliant shells like dash. Since this article targets Bash specifically, every conditional in the code examples uses [[ ]] by design.

When to Stop Using Bash

Bash is the right tool for orchestrating other programs, processing files, coordinating system operations, and automating tasks that would otherwise be done by hand at the terminal. It is the wrong tool for structured data processing, anything involving floating-point math, complex error handling across network calls, or any script that grows past a few hundred lines. The Google Shell Style Guide codifies a useful threshold: if your script exceeds about 100 lines or requires data structures beyond simple arrays, consider rewriting it in Python or another language.

The signs that a Bash script has outgrown its medium are recognizable: deeply nested conditionals that are difficult to follow, string manipulation that requires multiple chained parameter expansions to achieve what a single line of Python would do, or error handling that requires more boilerplate than the logic it protects. When you reach for jq to parse JSON or awk to restructure columnar data within a larger script, you are often better served by writing the entire pipeline in a language with native data structures. The decision is not about purity -- it is about maintainability. A 50-line Bash script that calls curl, validates the response, and restarts a service is perfectly appropriate. A 500-line Bash script that parses configuration files, manages state, and handles retry logic across multiple API endpoints is a maintenance liability that would be clearer and safer in Python.

Putting It Together: A Production-Grade Template

Here is a minimal, opinionated script template that incorporates the patterns discussed throughout this article:

deploy.sh
#!/usr/bin/env bash
# ==============================================================================
# Script: deploy.sh
# Description: Deploy service to a target environment
# Usage: ./deploy.sh --env [staging|production] [--dry-run]
# ==============================================================================

set -euo pipefail

# ---- Constants ---------------------------------------------------------------
readonly SCRIPT_NAME="$(basename "$0")"
readonly SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
readonly LOG_FILE="/var/log/${SCRIPT_NAME%.sh}.log"

readonly E_SUCCESS=0
readonly E_INVALID_ARGS=1
readonly E_MISSING_DEP=2

# ---- Logging -----------------------------------------------------------------
log() {
    local level="$1"; shift
    printf '%s [%s] %s\n' "$(date -u '+%Y-%m-%dT%H:%M:%SZ')" "$level" "$*" \
        | tee -a "$LOG_FILE"
}
log_info()  { log INFO  "$@"; }
log_error() { log ERROR "$@" >&2; }

# ---- Cleanup -----------------------------------------------------------------
cleanup() {
    local code=$?
    [[ -d "${tmp_dir:-}" ]] && rm -rf "$tmp_dir"
    [[ $code -ne 0 ]] && log_error "Exited with code $code"
}
trap cleanup EXIT

# ---- Dependency Check --------------------------------------------------------
require_commands() {
    local missing=()
    for cmd in "$@"; do
        command -v "$cmd" &>/dev/null || missing+=("$cmd")
    done
    if [[ ${#missing[@]} -gt 0 ]]; then
        log_error "Missing commands: ${missing[*]}"
        return $E_MISSING_DEP
    fi
}

# ---- Argument Parsing --------------------------------------------------------
usage() {
    cat <<EOF
Usage: $SCRIPT_NAME --env ENVIRONMENT [OPTIONS]

Options:
  --env, -e ENVIRONMENT   Target environment (staging|production)
  --dry-run               Print actions without executing
  --help, -h              Show this help message
EOF
}

parse_args() {
    local -n _opts="$1"; shift

    while [[ $# -gt 0 ]]; do
        case "$1" in
            --env|-e)
                [[ $# -ge 2 ]] || { log_error "--env requires an argument"; return $E_INVALID_ARGS; }
                _opts[environment]="$2"; shift 2 ;;
            --dry-run)
                _opts[dry_run]=true; shift ;;
            --help|-h)
                usage; exit $E_SUCCESS ;;
            *)
                log_error "Unknown option: $1"; usage; return $E_INVALID_ARGS ;;
        esac
    done

    [[ -v '_opts[environment]' ]] || {
        log_error "--env is required"
        usage
        return $E_INVALID_ARGS
    }
}

# ---- Main Logic --------------------------------------------------------------
main() {
    declare -A opts
    opts[dry_run]=false

    require_commands curl jq git
    parse_args opts "$@"

    tmp_dir=$(mktemp -d)
    log_info "Deploying to ${opts[environment]} (dry_run=${opts[dry_run]})"

    # Main deployment logic here
    if [[ "${opts[dry_run]}" == true ]]; then
        log_info "DRY RUN: would deploy to ${opts[environment]}"
    else
        log_info "Deployment complete"
    fi
}

main "$@"
Bug Hunt -- Find the Five Errors
#!/usr/bin/env bash
set -euo pipefail

BACKUP_DIR="/tmp/backup_$$"
mkdir "$BACKUP_DIR"

get_timestamp() {
    local ts=$(date +%s)
    echo "$ts"
}

count_files() {
    local count=0
    find "$1" -type f | while IFS= read -r f; do
        (( count++ ))
    done
    echo "$count"
}

main() {
    local total=$(count_files /var/log)
    echo "Found $total files"
    cp /var/log/*.log $BACKUP_DIR
}

main "$@"
Answers 1. Predictable temp path: /tmp/backup_$$ is vulnerable to symlink/TOCTOU attacks. Fix: BACKUP_DIR=$(mktemp -d) 2. No cleanup trap: if the script fails, the temp directory is orphaned. Fix: trap 'rm -rf "$BACKUP_DIR"' EXIT 3. Pipeline subshell in count_files: the pipe runs the while loop in a subshell, so count is always 0 in the parent. Fix: done < <(find "$1" -type f) 4. local masks exit code: local total=$(count_files ...) swallows a failure in count_files. Fix: local total; total=$(count_files /var/log) 5. Unquoted variable: cp /var/log/*.log $BACKUP_DIR -- the $BACKUP_DIR is unquoted. If the path contained spaces, it would be split. Fix: "$BACKUP_DIR"
The Unified Mental Model
1 Every command is a process. fork() copies, exec() replaces, and the child's memory is invisible to the parent. This one fact explains subshell variable loss, why export exists, and why process substitution was invented.
2 Unquoted expansions are evaluated twice. First the variable is expanded, then the result undergoes word splitting and globbing. Quoting suppresses the second pass. This is your first defense against both bugs and injection.
3 Bash trusts you to handle errors. By default, failures are silently ignored. set -euo pipefail changes the default, but its exceptions mean you still need explicit checks at critical points. Defensive scripting is layered: strict mode, trap, exit codes, ShellCheck.
4 Anything crossing a trust boundary is an attack vector. Environment variables, user input in arithmetic contexts, predictable temp file paths, unsanitized data in eval or unquoted expansions. The process model is also the security model.
5 The shell rewards builtins over externals. Every fork has a cost. Parameter expansion, [[ ]], (( )), printf -v, and $EPOCHSECONDS exist because the shell designers understood that performance-critical paths should stay inside the process.

Closing Thoughts: The Discipline Behind the Syntax

Bash scripting rewards understanding over memorization. The quirks that trap programmers -- silent variable loss in subshells, unquoted expansions, pipelines that swallow errors -- all follow logically from the Unix process model and from decisions made in the POSIX specification. Once you understand why the behavior exists, you stop fighting the shell and start using it.

The shift in thinking is this: treat your shell scripts as software, not as sequences of commands. Use set -euo pipefail from the first line. Quote every variable. Use local in functions. Use trap EXIT for cleanup. Check exit codes deliberately. Run ShellCheck before you commit. The investment in these disciplines is small. The return -- scripts that behave correctly under load, on unusual input, and at 3 a.m. when you are not watching -- is enormous. Once a script is production-grade, the next step is automating it with cron jobs so it runs reliably without manual intervention.

Bash has been on every Linux system since before many of its current users were born. It will be there long after the current generation of higher-level automation tools has cycled through. The engineers who understand it at depth will keep being the ones who can fix things when they break.

How to Write a Production-Grade Bash Script on Linux

Step 1: Enable Strict Mode and Define Constants

Start every script with set -euo pipefail to enable fail-fast behavior. Declare constants with readonly for values that should never change, such as the script name, working directory, and exit codes. This prevents silent failures from unset variables, unchecked errors, and masked pipeline failures.

Step 2: Register a Cleanup Trap on EXIT

Use trap cleanup EXIT to register a function that runs when the script exits for any reason. Inside the cleanup function, remove temporary files, release locks, and log the exit code. This guarantees cleanup regardless of whether the script succeeds, fails, or is interrupted by a signal.

Step 3: Structure with a main() Function and Proper Scoping

Define a main() function and call it at the end of the script with main "$@". Use local for all variables inside functions to prevent namespace pollution. Parse arguments with a dedicated function using a while-case loop and namerefs. This pattern makes the script sourceable, testable, and safe from global variable collisions.

Step 4: Validate with ShellCheck Before Deploying

Run ShellCheck on every script before committing or deploying. ShellCheck catches unquoted variables, subshell variable loss, incorrect array iteration, unreachable code, and dozens of other common Bash pitfalls automatically. Integrate it into your editor and CI/CD pipeline for continuous validation.

Frequently Asked Questions

Why does piping into a while read loop lose my variable changes in Bash?

When you pipe into a while loop, Bash runs the entire loop body in a subshell. A subshell is a child process that inherits a copy of the parent shell variables, but any changes it makes are invisible to the parent. When the subshell exits, all variable modifications are discarded. The fix is to use process substitution with done < <(command) syntax, which keeps the loop running in the current shell context.

What does set -euo pipefail do in a Bash script?

This is three options combined. set -e (errexit) exits the script immediately if any command returns a non-zero exit code. set -u (nounset) exits with an error if any unset variable is expanded, catching typos and uninitialized variables. set -o pipefail changes the exit status of a pipeline to the status of the rightmost command that failed, rather than always using the status of the last command. Together, they give Bash scripts fail-fast behavior.

What is the difference between a shell variable and an environment variable in Bash?

A shell variable exists only in the current shell's memory and is not visible to child processes. An environment variable has been marked with export and is copied into the environment block that child processes inherit. Scripts run in a child process, so they can only see exported environment variables, not unexported shell variables from the parent shell.

How does trap EXIT work for cleanup in Bash scripts?

The trap builtin registers a handler function that executes when the shell exits for any reason, including normal completion, errors triggered by set -e, or signal termination. By trapping the EXIT pseudo-signal, you guarantee that cleanup code (removing temporary files, releasing locks, logging) runs regardless of how the script terminates. This is the standard pattern for writing reliable cleanup logic in production Bash scripts.

How do you debug a Bash script at runtime?

Enable execution tracing with set -x (or set -o xtrace), which causes Bash to print every command to stderr before executing it, with all variables expanded to their current values. Customize the trace prefix by setting PS4 to include the filename, line number, and function name. In Bash 4.1 and later, redirect trace output to a separate file descriptor using BASH_XTRACEFD to keep debug logs separate from normal stderr. You can also enable tracing conditionally by checking an environment variable like TRACE=1 at the top of your script.

What is the difference between single bracket [ ] and double bracket [[ ]] in Bash?

Single bracket [ ] is the POSIX-compatible test builtin -- it is a regular command, so its arguments undergo word splitting and glob expansion. Double bracket [[ ]] is a Bash keyword parsed by the shell before execution, which suppresses word splitting and globbing automatically. Double brackets also support regex matching with =~, pattern matching with ==, and compound conditions with && and || inside the expression. Use [[ ]] in Bash scripts for safety and expressiveness; use [ ] only when writing POSIX-portable shell scripts.

When should you use Bash instead of Python for scripting?

Bash is the right tool for orchestrating other programs, processing files, coordinating system operations, and automating tasks that are essentially sequences of terminal commands. Python is the better choice when a script requires complex data structures, floating-point math, structured error handling across network calls, or when the script exceeds a few hundred lines. A practical threshold: if the script needs jq to parse JSON or requires deeply nested conditionals, the logic would be clearer and safer in Python. Both languages are essential for sysadmin and DevOps work -- Bash for system-level glue, Python for structured application logic.

Sources and References

Technical details in this guide are drawn from official documentation and verified sources.