There is no shortage of systemd cheat sheets. Type systemctl start here, journalctl -f there, done. But that surface-level familiarity tends to collapse the first time a service silently fails at boot, the first time a timer doesn't fire after a weekend outage, or the first time a dependency cycle locks up your server during an update. This article goes deeper -- not to cover every directive in the man pages, but to build the mental model that makes systemd predictable rather than mysterious. Accurate understanding of the unit file lifecycle, ordering semantics, cgroup integration, and journald's structured log format is what separates administrators who fix things quickly from those who reboot and hope.
A note on versions: this article targets current upstream systemd (v253+, which ships with Debian 12, Ubuntu 24.04, RHEL 9, and Fedora 38+). Some features -- particularly Type=exec, Type=notify-reload, and certain systemd-analyze security output fields -- are absent or behave differently on older distributions. Where a feature was introduced in a specific version, that version is noted.
The Unit Model: What systemd Manages
systemd's design is documented in its own architecture overview. Per the systemd man page, it is described as "a system and service manager for Linux operating systems" that runs as PID 1 and manages the entire user space from bootup to shutdown. The key architectural insight is that systemd replaced a collection of ad-hoc shell scripts with a dependency graph of typed units.
Every resource systemd manages is described by a unit. Units have types, and the type determines what directives are valid and what lifecycle the unit follows. The types you will encounter in production:
- .service -- a process or daemon. The most common type.
- .timer -- a time-based trigger that activates a corresponding .service.
- .target -- a synchronization point; groups units and represents a system state.
- .socket -- a socket that can activate a service on first connection (socket activation).
- .mount -- a filesystem mount point, generated from
/etc/fstabor written manually. - .path -- activates a service when a filesystem path changes.
- .slice -- a cgroup hierarchy node for resource allocation.
Unit files are plain text in an INI-like format. They live in three locations, searched in priority order from highest to lowest: /etc/systemd/system/ (administrator overrides), /run/systemd/system/ (runtime units, ephemeral), and /usr/lib/systemd/system/ (distribution-installed defaults). Files in /etc/ shadow files of the same name in /usr/lib/. This is important: when you want to override a vendor-provided unit without editing it directly, you create a file in /etc/systemd/system/ with the same name, or -- the cleaner approach -- use a drop-in directory.
Never edit files in /usr/lib/systemd/system/ directly. A package update will overwrite your changes silently. Instead, use systemctl edit unitname.service, which creates a drop-in file at /etc/systemd/system/unitname.service.d/override.conf that merges with the original. Your changes survive updates.
Writing Service Units That Work
A service unit has three sections: [Unit] for metadata and dependency declarations, [Service] for the process configuration, and [Install] for enabling behavior. Each section has distinct responsibilities, and conflating them is a common source of bugs.
The [Unit] Section: Metadata and Dependencies
The [Unit] section describes what the unit is and how it relates to other units. The dependency directives here are the most subtle part of systemd and the source of a disproportionate number of production problems.
There are four primary ordering and requirement directives. Requires= creates a hard dependency: if the listed unit fails to start, this unit will also fail. Wants= is a soft dependency: systemd will attempt to start the listed unit, but this unit will still start even if the dependency fails. After= controls ordering -- this unit starts after the listed unit, regardless of any dependency relationship. Before= is the inverse.
This is the distinction that trips up many administrators: After= and Requires= are independent. Declaring Requires=postgresql.service without also declaring After=postgresql.service means both units will start simultaneously, which defeats the purpose. You almost always need both directives together, or you use BindsTo=, which is stronger than Requires=: it implies both a hard dependency and that any state transition of the dependency (stop, deactivation, or entering a failed state) propagates immediately to this unit as well. Unlike Requires=, which only reacts to startup failure, BindsTo= will stop this unit if the bound unit is stopped, deactivated, or fails at any point during its lifetime.
[Unit] Description=Production Web Application Documentation=https://internal.wiki/webapp After=network-online.target postgresql.service redis.service Wants=network-online.target Requires=postgresql.service BindsTo=redis.service # If redis stops, we stop too. If postgres stops, we fail. # network-online.target is soft -- degraded mode may be acceptable. [Service] Type=notify User=webapp Group=webapp WorkingDirectory=/opt/webapp EnvironmentFile=-/etc/webapp/env ExecStartPre=/opt/webapp/bin/pre-flight-check ExecStart=/opt/webapp/bin/server ExecReload=/bin/kill -HUP $MAINPID Restart=on-failure RestartSec=5 StartLimitIntervalSec=60 StartLimitBurst=3 TimeoutStartSec=30 TimeoutStopSec=30 KillMode=mixed KillSignal=SIGTERM # Hardening NoNewPrivileges=true ProtectSystem=strict ProtectHome=true PrivateTmp=true PrivateDevices=true ReadWritePaths=/opt/webapp/data /var/log/webapp SystemCallFilter=@system-service [Install] WantedBy=multi-user.target
Service Types: The Most Misunderstood Directive
The Type= directive tells systemd how the service signals that it has finished starting. Getting this wrong means systemd will report a service as active before it's fully ready, causing dependent services to fail with confusing errors.
The systemd.service(5) man page defines the following types. Understanding all of them matters because the wrong choice has real consequences.
Type=simple is the default when ExecStart= is set but Type= is not. systemd considers the unit started as soon as fork() returns -- before the service binary has even been executed via execve(). This means systemctl will report success and dependent units will activate even if the binary path is wrong or the User= doesn't exist. Use this only for processes that are genuinely ready to serve the moment they are forked. For virtually all new services, Type=exec is the better choice -- the upstream man page now explicitly states "typically, Type=exec is the better choice."
Type=exec (added in v240) is similar to simple but systemd waits until after execve() succeeds before proceeding to dependent units. This means a missing binary or an invalid User= will cause an immediate, correctly-reported failure rather than a silent later one. For new services that cannot use notify, prefer exec over simple.
Type=forking is for traditional Unix daemons that fork and daemonize into the background. systemd waits for the parent process to exit, then tracks the child. You must set PIDFile=. Per the current upstream man page, this type is now explicitly discouraged: "The use of this type is discouraged, use notify, notify-reload, or dbus instead." It exists for legacy compatibility.
"If so, notify, notify-reload, or dbus (the latter only in case the service provides a D-Bus interface) are the preferred options as they allow service program code to precisely schedule when to consider the service started up successfully and when to proceed with follow-up units."
-- systemd.service(5) man page, freedesktop.org (current upstream)
Type=oneshot is for tasks that run to completion and exit. This is the implied default if neither Type= nor ExecStart= are specified. Unlike exec, systemd waits for the main process to exit before proceeding to dependent units. Pair with RemainAfterExit=yes to hold the unit in "active" state after the command completes, otherwise it transitions immediately to "dead."
Type=notify is the current standard for well-behaved daemons: the process sends an sd_notify(3) call with READY=1 to systemd's notification socket when it has finished initializing, and systemd waits for that signal before proceeding. This requires explicit support in the application code. Per the upstream man page, "notify, notify-reload, or dbus... are the preferred options as they allow service program code to precisely schedule when to consider the service started up successfully."
Type=notify requires that systemd accept the notification from the right process. The NotifyAccess= directive controls this. It defaults to main when unset, meaning only the main (directly forked) process can send notifications. If a child process sends READY=1 -- which happens in some multi-process or supervisor-style daemons -- the notification is silently dropped, the service appears to hang until TimeoutStartSec= expires, and then fails. If your daemon forks during initialization and the child is the one calling sd_notify(), set NotifyAccess=all explicitly. This is a hard-to-diagnose failure mode because there is no error message indicating the notification was dropped.
Type=notify-reload (added in v253) is the recommended evolution of notify for services that support config reloads. In addition to startup notification, the service signals the beginning of a reload with RELOADING=1 and its completion with READY=1 again. This allows systemctl reload to block and return only after the reload is fully complete, eliminating a class of race conditions that affect plain notify services during reloads. If your application or daemon calls sd_notify() already, adding reload notification support is the correct next step.
Type=dbus waits for the service to acquire a specific name on the D-Bus system bus (configured with BusName=). The unit is considered active while that bus name is held and transitions to stopping when it is released.
Type=idle behaves like simple but defers the actual execution until all active jobs in the initial boot transaction have been dispatched -- or until a 5-second timeout, whichever comes first. The 5-second ceiling is a deliberate design constraint, not a failure safety: it ensures that a perpetually busy system cannot defer an idle-typed service indefinitely. Its sole purpose is to prevent console output from interleaving during boot (it is used by getty@.service for exactly this reason). It must never be used as a general ordering mechanism.
Using Type=simple for a forking daemon is one of the most common misconfigurations. The parent forks, systemd sees the process as "started," marks the service active, and activates dependent units -- but the child process hasn't finished its initialization yet. If anything in that initialization fails, it will appear as a runtime failure long after systemd thinks the service came up cleanly. Use Type=exec at minimum, or Type=notify if the daemon supports it.
Restart Behavior and Backoff
The Restart= directive controls when systemd will automatically restart a service. The valid values are no, on-success, on-failure, on-abnormal, on-watchdog, on-abort, and always. For production services, on-failure is usually the right choice: restart when the process exits with a non-zero status, is killed by a signal, or times out, but not when it exits cleanly (exit code 0).
Pairing Restart=on-failure with StartLimitIntervalSec= and StartLimitBurst= prevents infinite crash loops from consuming system resources. The example above allows 3 restarts within 60 seconds before systemd gives up and marks the service as failed. After the limit is hit, the service enters the failed state and will not restart until you manually run systemctl reset-failed webapp.service and systemctl start webapp.service again.
RestartSec= is the delay between restart attempts, including the first. The default is 100ms, which is why restarts appear nearly instantaneous. If your service requires a genuine warmup period before retrying, set RestartSec= to a meaningful value. For post-crash cleanup, combine with ExecStartPre= commands that perform readiness checks before the main process launches.
If a service unit has no ExecStop= directive, systemd sends KillSignal= (default: SIGTERM) to the service immediately when stop is requested -- there is no pre-stop command to drain connections, flush buffers, or notify dependent systems. For any service that needs ordered teardown before the main process receives its kill signal, define an ExecStop= command (e.g. a drain script) or use ExecStopPost= for cleanup that runs after the process has already exited. The two have different semantics: ExecStop= runs while the process is still alive and should initiate an orderly shutdown; ExecStopPost= runs after the cgroup is empty and is used for cleanup regardless of how the service ended.
Conditions and Assertions
Unit files support two categories of pre-flight checks: Condition*= directives and Assert*= directives. Both are evaluated before systemd attempts to start the unit, but their failure behavior differs critically.
A failed Condition*= causes the unit to be silently skipped -- it transitions to "inactive (dead)" with no error logged. A failed Assert*= causes the unit to fail with an error. Use Condition*= when the unit is legitimately optional on some systems; use Assert*= when its absence indicates a configuration error that should be visible.
[Unit] # Only start if the config file exists (silently skip if not) ConditionPathExists=/etc/webapp/config.yaml # Fail with an error if the data directory is missing AssertPathIsDirectory=/opt/webapp/data # Only run on x86_64 (useful in portable service files) ConditionArchitecture=x86-64 # Only run if this is not a container environment ConditionVirtualization=!container
Common useful conditions include ConditionPathExists=, ConditionFileNotEmpty=, ConditionHost=, ConditionVirtualization=, and ConditionCapability=. The full list is in systemd.unit(5) under UNIT FILE CONDITIONS.
Timers: Replacing Cron Without Losing Your Mind
systemd timers are the correct way to schedule recurring tasks on modern Linux systems. The systemd.timer(5) man page describes them as "unit configuration files whose names end in .timer encoding information about a timer controlled and supervised by systemd." The significant advantages over cron are: full journal logging of every run, Persistent= for catching up missed runs, RandomizedDelaySec= for fleet-wide jitter, and dependency declarations in the companion service unit that cron simply cannot express.
A timer always activates a corresponding service of the same base name. The timer backup.timer activates backup.service. You must write both unit files.
[Unit] Description=Nightly database backup timer [Timer] OnCalendar=*-*-* 02:30:00 RandomizedDelaySec=600 Persistent=true AccuracySec=1min [Install] WantedBy=timers.target
[Unit] Description=Nightly database backup After=network-online.target postgresql.service Requires=postgresql.service [Service] Type=oneshot User=backupuser ExecStart=/opt/backup/run-backup.sh StandardOutput=journal StandardError=journal SyslogIdentifier=backup NoNewPrivileges=true ProtectSystem=strict ReadWritePaths=/mnt/backups
Notice that there is no [Install] section in the service unit above. That is intentional. Timer-activated services should not be enabled directly -- only the timer is enabled. If you enable the service independently, it could be started by both the timer and boot logic, potentially running your backup script twice.
Calendar Expressions
The OnCalendar= syntax is powerful but worth learning properly. The full format is DayOfWeek Year-Month-Day Hour:Minute:Second. Ranges use .., lists use commas, and * is a wildcard. To verify a calendar expression before deploying, use systemd-analyze calendar:
# Verify and show next trigger times $ systemd-analyze calendar "Mon..Fri *-*-* 08:00:00" Original form: Mon..Fri *-*-* 08:00:00 Normalized form: Mon..Fri *-*-* 08:00:00 Next elapse: Mon 2026-03-09 08:00:00 UTC (in UTC): Mon 2026-03-09 08:00:00 UTC From now: 3 days 21h left # Common patterns # Every 15 minutes: OnCalendar=*:0/15 # First Sunday of every month at 3 AM: OnCalendar=Sun *-*-1..7 03:00:00 # Every weekday at noon: OnCalendar=Mon..Fri 12:00:00 # Every 6 hours: OnCalendar=00/6:00:00
Monotonic Timers
In addition to calendar-based timers, systemd supports monotonic timers that fire relative to system events rather than wall clock time. These are defined with directives like OnBootSec=, OnActiveSec=, OnStartupSec=, and OnUnitActiveSec=. A common use case is running a cleanup task five minutes after the system finishes booting:
[Timer] OnBootSec=5min OnUnitActiveSec=1h # First run: 5 minutes after boot # Subsequent runs: every 1 hour after the last activation
Monotonic timers do not benefit from Persistent=true. That directive only applies to calendar-based timers. For a monotonic timer, if the system was off when the interval elapsed, the timer simply fires the specified duration after the next boot -- which is usually the correct behavior anyway.
Targets, Dependencies, and Boot Ordering
Targets are the synchronization checkpoints of the systemd boot process. According to the systemd bootup(7) documentation, the default boot target on server systems is typically multi-user.target, which represents "a multi-user system with networking." On desktop systems, it's graphical.target.
Understanding the boot graph helps diagnose ordering failures. The path from kernel handoff to a fully operational server, simplified, is:
sysinit.target-- local filesystems mounted, swap enabled, kernel modules loadedbasic.target-- sockets, timers, and path units active; the baseline for most servicesnetwork.target-- networking is configured (but not necessarily online)network-online.target-- networking is confirmed online (NIC has IP, route is reachable)multi-user.target-- all standard services are active; the system is ready for logins
"The important distinction between
-- systemd.special(7) man page, freedesktop.orgnetwork.targetandnetwork-online.targetis that the former is a passive target indicating that network configuration has been applied, while the latter is an active target indicating that the network is operationally ready for use."
This distinction is the source of a surprisingly common production issue. Services that use After=network.target may start before the network is really usable -- before DHCP has assigned an address, before DNS resolvers are reachable. If your service opens a network connection at startup, you need After=network-online.target paired with Wants=network-online.target.
There is a second layer to this that tutorials almost never explain: network-online.target is only meaningful if a provider service is enabled to back it. Per the authoritative systemd NETWORK_ONLINE documentation: "Note that normally, if no service requires it and if no remote mount point is configured, this target is not pulled into the boot, thus avoiding any delays during boot should the network not be available."
"network-online.target is a target that actively waits until the network is 'up', where the definition of 'up' is defined by the network management software... It is an active target, meaning that it may be pulled in by the services requiring the network to be up, but is not pulled in by the network management service itself."
-- systemd NETWORK_ONLINE documentation, systemd.io
The provider services are NetworkManager-wait-online.service (when NetworkManager manages the network) and systemd-networkd-wait-online.service (when systemd-networkd is used). These services have WantedBy=network-online.target and Before=network-online.target in their [Install] sections, so they are enabled automatically alongside their parent network manager. If neither provider service is enabled -- which is common in minimal containers, CI images, and some cloud VMs -- network-online.target will be reached instantly with no actual network check performed. Services depending on it will start with no delay and with no guarantee that the network is up.
Verify your provider service is enabled: systemctl is-enabled NetworkManager-wait-online.service systemd-networkd-wait-online.service. In containers and many cloud images, both may be disabled, making After=network-online.target a no-op. Also note: network-online.target has a 90-second timeout. On slow DHCP environments, it can significantly delay boot even when it does work.
Writing Custom Targets
Custom targets are useful when you want to group a set of services and control them as a unit, or when you need an explicit synchronization point in a complex dependency graph. A target unit is minimal:
[Unit] Description=Full application stack (DB + cache + app + worker) Requires=postgresql.service redis.service webapp.service worker.service After=postgresql.service redis.service webapp.service worker.service [Install] WantedBy=multi-user.target
Now systemctl start app-stack.target brings up all four services in dependency order. However, systemctl stop app-stack.target stops the target unit itself -- not the services. This surprises almost everyone who encounters it for the first time. Stopping a target does not propagate to its dependencies unless those dependencies declare a reverse binding back to the target.
The correct fix is to add PartOf=app-stack.target to each service unit. PartOf= creates a one-way binding: stopping or restarting the target propagates to the service, but not vice versa. It does not create a start ordering dependency -- you still need After= and Requires= for that.
[Unit] Description=Web Application PartOf=app-stack.target # PartOf= means: when app-stack.target is stopped/restarted, # stop/restart this unit too. Does NOT imply ordering. After=app-stack.target postgresql.service redis.service
journald: Structured Logging That You Can Use
The systemd journal collects log output from every service, the kernel, and boot messages into a unified binary format. The systemd-journald.service(8) documentation notes that each journal entry carries structured metadata fields automatically: _SYSTEMD_UNIT, _PID, _UID, PRIORITY, _BOOT_ID, and many others. This structure makes filtering far more precise than grepping plaintext logs.
Essential journalctl Queries
# Follow a service's logs in real time $ journalctl -u webapp.service -f # Logs since last boot, severity err (level 3) and more severe (crit, alert, emerg) $ journalctl -b -p err # Logs from this boot AND the previous boot $ journalctl -b -1 -u webapp.service # Logs in a time window $ journalctl --since "2026-03-01 00:00" --until "2026-03-01 06:00" # JSON output for parsing or shipping to a log aggregator $ journalctl -u webapp.service -o json | jq '.MESSAGE' # Kernel messages (dmesg equivalent with timestamps) $ journalctl -k --since today # All logs from a specific executable path $ journalctl _EXE=/opt/webapp/bin/server # Count log entries by priority (useful for health dashboards) $ journalctl -u webapp.service --output=cat -p err --no-pager | wc -l
Making the Journal Persistent
By default on many distributions, journals are stored in /run/log/journal/, which is a tmpfs. That means every reboot wipes your logs. The most reliable and portable way to make them persistent is to set Storage=persistent explicitly in /etc/systemd/journald.conf -- this is the correct approach and is consistent with the retention configuration you will add in the same file anyway. Simply creating the directory is also sufficient on some distributions, but explicit configuration is preferable.
[Journal] Storage=persistent # Maximum disk usage for persistent journal SystemMaxUse=2G # Keep at least this much free on the journal partition SystemKeepFree=512M # Maximum age of journal files MaxRetentionSec=30day # Compress journal files Compress=yes # Forward to syslog (set to yes if you run rsyslog/syslog-ng) ForwardToSyslog=no
After editing journald.conf, apply with sudo systemctl restart systemd-journald. Verify current disk usage with journalctl --disk-usage.
Common Failure Modes and How to Diagnose Them
Knowing the failure modes is more useful than memorizing commands. These are the patterns that appear repeatedly across production environments.
Failure Mode 1: Service Fails Immediately at Boot, Works When Started Manually
This is almost always an ordering problem. The service depends on something -- a network address, a database socket, a mounted filesystem -- that isn't ready when systemd starts the service. The manual start works because by the time you type the command, everything is ready.
Diagnosis: check systemctl list-dependencies --before servicename.service to see what the service blocks, and systemctl list-dependencies servicename.service to see its own declared dependencies. Then compare that against what your service needs at startup. Add the missing After= and Wants= or Requires= declarations.
Failure Mode 2: Service Is Active but Not Actually Running
This happens with Type=forking services when the PIDFile= directive is wrong or missing. systemd waits for the parent to exit, sees that happen, declares the service active, but if it can't find the child PID, the service has no tracked process. You get an "active (running)" status with no managed PID.
Diagnosis: systemctl status servicename.service will show the cgroup tree. If it says "active (running)" but the cgroup is empty or missing, you have a PID tracking failure. Verify your PIDFile= path is correct and the daemon is truly writing to it.
Failure Mode 3: Service Hits StartLimitBurst and Stops Restarting
The service enters failed state with a note about the start rate limit being hit. Increasing StartLimitBurst= is the wrong fix; it masks a service that is crashing on startup. First, find the actual crash reason with journalctl -u servicename.service -n 100. Only after fixing the root cause should you consider adjusting the limit.
# Step 1: Get status and last few log lines $ systemctl status webapp.service # Step 2: Pull full journal since last boot for this unit $ journalctl -b -u webapp.service --no-pager # Step 3: Validate unit file syntax $ systemd-analyze verify webapp.service # Step 4: View resolved unit (shows effective values after drop-ins) $ systemctl cat webapp.service # Step 5: Show full dependency tree $ systemctl list-dependencies webapp.service # Step 6: See what depends on this service (reverse) $ systemctl list-dependencies --reverse webapp.service # Step 7: Check boot performance and critical path $ systemd-analyze critical-chain webapp.service # After fixing root cause, reset failed state before restarting $ sudo systemctl reset-failed webapp.service $ sudo systemctl start webapp.service
Failure Mode 4: Leftover Processes After Service Restart
This happens when a service spawns child processes and the KillMode= directive is misconfigured. The KillMode= directive controls which processes in a service's cgroup receive the stop signal.
The default is control-group: every process in the cgroup receives the kill signal on stop, which is usually correct. If you set KillMode=process, only the main process receives the signal; child processes are orphaned and reparented to PID 1 (systemd itself), which will reap them. They will not become zombies in a properly running systemd environment because systemd as PID 1 calls wait() continuously. However, those orphaned processes are now running outside any service lifecycle -- they will not be tracked, will not be restarted if they crash, and may hold open ports, file descriptors, or database connections from a previous service invocation.
The correct mode for services that need graceful shutdown of the main process plus cleanup of the rest is KillMode=mixed. Per the systemd.kill(5) man page: "If set to mixed, the SIGTERM signal is sent to the main process while the subsequent SIGKILL signal is sent to all remaining processes of the unit's control group." The word "subsequent" matters. SIGTERM goes to the main process first. The SIGKILL to remaining cgroup members fires when one of two conditions is met: either the main process exits on its own (at which point SIGKILL goes to the rest immediately), or TimeoutStopSec= elapses without the main process having exited (at which point SIGKILL is sent to the entire remaining cgroup). This is not a simultaneous fan-out -- it is a two-phase sequence driven by main process exit or timeout.
The default KillMode=control-group sends KillSignal= (SIGTERM by default) to every process in the cgroup at once, then escalates to SIGKILL after TimeoutStopSec= for any that remain. Use mixed when your main process has signal handling logic that must complete before child processes are terminated -- for example, a server that finishes draining in-flight requests before workers are killed. Set TimeoutStopSec= generously enough to allow the main process to complete its teardown. systemd also always sends SIGCONT immediately after the configured kill signal to ensure suspended processes can receive and act on it.
Failure Mode 5: Unit Change Has No Effect After daemon-reload
You edit a unit file (or drop-in), run systemctl daemon-reload, and the service continues behaving as before. This is one of the most common points of confusion for administrators new to systemd, and it stems from misunderstanding what daemon-reload really does.
systemctl daemon-reload tells PID 1 to re-read all unit files from disk and rebuild its internal state. It does not affect any currently running processes. The service continues to run under its old configuration until it is restarted. If you changed Environment=, LimitNOFILE=, MemoryMax=, or any other directive that governs the running process, you must restart the service to apply it. If the service supports config reloading via a signal (and you have defined ExecReload=), a systemctl reload will re-read the application's config file -- but it will not re-apply the unit file directives, which are only evaluated at process start.
Diagnosis: after editing a unit and running daemon-reload, run systemctl show webapp.service | grep -i fragment to confirm the new unit file path and modification time are reflected. Then compare systemctl cat webapp.service (the resolved unit as systemd sees it) against the running process environment with cat /proc/$(systemctl show -p MainPID --value webapp.service)/environ | tr '\0' '\n'. Discrepancies confirm that a restart is needed to apply the new configuration.
Drop-ins, Overrides, and Templated Units
Drop-In Overrides
When you need to modify a vendor-provided unit -- to add an environment variable, increase file descriptor limits, add a dependency -- use drop-in files instead of editing the original. Run systemctl edit unitname.service to open an editor that creates the drop-in automatically. The result is a file at /etc/systemd/system/unitname.service.d/override.conf.
# Increase shared memory limits and open file descriptors # without touching the vendor unit file [Service] LimitNOFILE=65536 LimitNPROC=32768 Environment=PGDATA=/var/lib/postgresql/16/main
After saving, run systemctl daemon-reload to pick up the change. Important: daemon-reload re-reads unit files from disk -- it does not restart the running service. The service must be separately restarted (or reloaded, if it supports ExecReload=) before the new configuration takes effect on the running process. To verify the effective configuration, run systemctl cat postgresql.service -- it shows all files that compose the unit in order, separated by file path headers.
In the service unit shown earlier, the directive reads EnvironmentFile=-/etc/webapp/env with a leading dash. That dash means "ignore this file if it does not exist." Without it, a missing env file causes the service to fail at the environment-loading stage -- before ExecStart= is ever reached -- with an error that does not always clearly identify the env file as the cause. The dash prefix is a silent-optional idiom that appears throughout systemd unit files. Any directive that accepts a path can prefix it with - to suppress failures from absence: ExecStartPre=-/opt/app/check, for example, means "run this pre-flight check but don't fail the service if the binary is missing." This is distinct from the exit code: a --prefixed command that exists but exits non-zero will still be reported as an error.
Templated Units
Template units allow a single unit file to be instantiated multiple times with different parameters. The template file name contains an @ character: worker@.service. The instance name -- passed at startup as worker@1.service, worker@2.service, etc. -- is available inside the unit file as the %i specifier.
[Unit] Description=Background worker instance %i After=network-online.target redis.service Requires=redis.service [Service] Type=simple User=worker ExecStart=/opt/app/bin/worker --id=%i --queue=default Restart=on-failure RestartSec=10 SyslogIdentifier=worker-%i [Install] WantedBy=multi-user.target
Enable and start three workers: systemctl enable --now worker@{1,2,3}.service. Each gets its own cgroup, its own journal identifier (worker-1, worker-2, worker-3), and its own restart policy. Scaling horizontally is now a single command.
Resource Controls via cgroups
Every service runs in its own cgroup, and systemd exposes Linux's cgroup resource controls directly in unit files. Per the systemd.resource-control(5) documentation, these directives apply to the service's cgroup slice and control CPU, memory, and I/O limits.
[Service] # Limit memory usage; service is killed if it exceeds this MemoryMax=512M # Soft memory limit; systemd prefers to reclaim from this service first MemoryHigh=400M # CPU weight (relative to other services; default is 100) CPUWeight=50 # Hard CPU quota: 20% of one CPU CPUQuota=20% # I/O weight IOWeight=50 # Maximum open file descriptors LimitNOFILE=16384 # Assign to a specific slice (cgroup subtree) Slice=background.slice
These controls are particularly valuable in multi-tenant environments or when running several services on a single host. A runaway service that exhausts memory will be killed by the OOM handler within its cgroup limits rather than taking down the entire host. Monitor resource usage in real time with systemd-cgtop, which provides a top-like view of cgroup resource consumption sorted by CPU, memory, or I/O.
MemoryMax= is enforced as a hard limit. When a process exceeds it, the kernel OOM killer terminates a process in the cgroup. What happens next depends on OOMPolicy=, which accepts three values:
continue: the OOM kill is logged, but the unit keeps running. stop (the system default when not set in the unit): systemd cleanly terminates the unit. kill: systemd sends SIGKILL to all remaining processes in the cgroup immediately.
The critical trap is this: with the default OOMPolicy=stop, systemd performs a clean stop of the service. A clean stop is treated the same as systemctl stop, which means Restart=on-failure will not trigger -- clean stops are explicitly excluded from restart logic. If you want the service to restart after an OOM kill, you must set OOMPolicy=kill explicitly. With OOMPolicy=kill, the unit enters a failed state (result: oom-kill) and Restart=on-failure will trigger as expected. Always verify this in your specific systemd version, as the interaction between OOMPolicy= and Restart= has been a documented source of confusion in the systemd issue tracker.
Hardening Audit with systemd-analyze security
Once you have written a service unit with sandboxing directives, you need a way to verify their coverage and find gaps you missed. systemd-analyze security (added in v240) provides exactly this. It evaluates a service's unit file against a comprehensive list of hardening directives and produces an exposure score from 0 (most secure) to 10 (least secure), along with a per-directive breakdown showing what is set, what is missing, and the weighted score impact of each gap.
# Analyze all running services (sorted by exposure) $ systemd-analyze security # Analyze a specific service with full directive table $ systemd-analyze security webapp.service # Example output (truncated): NAME DESCRIPTION EXPOSURE ✗ RootDirectory=/RootImage= Service runs within the host's root dir 0.1 ✗ User=/DynamicUser= Service runs as root user 0.4 ✗ NoNewPrivileges= Service processes may acquire new privileges 0.2 ✓ PrivateTmp= Service has private /tmp --- ✓ ProtectSystem= OS directories are read-only to service --- ✗ SystemCallFilter= System call filter not defined 0.3 → Overall exposure level for webapp.service: 4.1 OK
The workflow is iterative: run the analyzer, add or tighten a directive, reload the daemon, test that the service still functions, and repeat. A score in the "OK" range (below about 4.0) indicates meaningful sandboxing. The score is a heuristic, not a proof of safety -- a service running as a non-root user with ProtectSystem=strict and a syscall filter is substantially hardened even if it doesn't reach a perfect score.
Use systemd-analyze syscall-filter to inspect what system calls belong to predefined filter groups like @system-service and @privileged. Before adding a SystemCallFilter= directive to a production service, profile what your process calls:
# Run your service under strace, capturing all syscalls including children strace -f -e trace=all -o /tmp/calls.txt /opt/webapp/bin/server # Extract unique syscall names from the trace awk -F'(' '{print $1}' /tmp/calls.txt | sort -u | grep -v '^[0-9]' > /tmp/syscalls-used.txt # Cross-reference against a group to find what would be blocked systemd-analyze syscall-filter @system-service | grep -Fxv -f /tmp/syscalls-used.txt
A SystemCallFilter= that is too aggressive will produce cryptic EPERM or ENOSYS failures that are difficult to diagnose after the fact. Always verify under load, not just at startup -- some syscalls only appear during specific operations like config reload, log rotation, or high-memory pressure.
For the service unit shown earlier in this article, adding the SystemCallFilter=@system-service directive is the highest-impact single addition many web application services can make. It restricts the process to a reasonable set of system calls needed for typical server workloads, blocking a large category of exploitation primitives that rely on unusual or privileged syscalls.
systemd-analyze: Boot Diagnostics and Validation
The systemd-analyze tool does far more than security scoring. It is the primary diagnostic tool for understanding boot performance, unit dependency problems, and configuration correctness.
# Total boot time broken into firmware, bootloader, kernel, userspace $ systemd-analyze time # List services sorted by startup time (find what's slowing boot) $ systemd-analyze blame # Show the critical path through the dependency graph $ systemd-analyze critical-chain # Show the critical chain for a specific service $ systemd-analyze critical-chain webapp.service # Generate an SVG timeline of the entire boot (open in browser) $ systemd-analyze plot > boot.svg # Validate unit file syntax and dependency semantics # (run this in your CI/CD pipeline before deploying unit changes) $ systemd-analyze verify webapp.service # Verify a calendar expression before deploying $ systemd-analyze calendar "Mon..Fri *-*-* 08:00:00" # Translate a numeric exit code to its name $ systemd-analyze exit-status 127 # List all service units with their security exposure scores $ systemd-analyze security --no-pager | sort -k2 -rn
systemd-analyze blame is the first command to run when boot feels slow. It shows initialization time for each service in descending order. systemd-analyze critical-chain goes further and shows the actual dependency chain that determined total boot time -- knowing that a service took 8 seconds is less useful than knowing it was on the critical path because nothing could start until it finished. systemd-analyze plot produces the most complete view: an SVG timeline showing parallelization, overlap, and exactly where time was spent.
systemd-analyze verify checks unit file syntax and detects common dependency problems before the unit is deployed. It should be part of every CI pipeline that manages systemd unit files. It catches issues like Requires= without After=, missing [Install] sections, and typos in directive names that would otherwise fail silently. Note what it does not catch: it does not execute the service, so runtime failures (wrong binary path, missing user, permission errors) are outside its scope. Pair it with systemd-analyze security in CI to enforce a minimum hardening threshold before deployment.
# Validates unit files and enforces a maximum exposure score in CI. # Requires: ubuntu-latest runner (systemd available via systemd-analyze). name: systemd unit validation on: [push, pull_request] jobs: validate: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Verify unit file syntax and dependencies run: | for unit in deploy/systemd/*.service deploy/systemd/*.timer; do echo "--- Verifying $unit" systemd-analyze verify "$unit" || exit 1 done - name: Enforce hardening score threshold run: | # Fail if any service unit scores above 7.0 (0=most secure, 10=least) for unit in deploy/systemd/*.service; do score=$(systemd-analyze security "$unit" 2>/dev/null \ | grep "Overall exposure" | awk '{print $NF}') echo "$unit exposure: $score" if awk "BEGIN{exit !($score > 7.0)}"; then echo "FAIL: $unit exceeds hardening threshold (score $score > 7.0)" exit 1 fi done
The threshold of 7.0 is a starting point; adjust it based on the criticality of your service. The score is a heuristic -- a service with User=root and no syscall filter will score poorly even if it is otherwise well-isolated, and vice versa. Use it as a forcing function for iterative improvement rather than a binary pass/fail gate in early adoption.
Wrapping Up
systemd's learning curve is real, but it is not arbitrary. Every part of the design -- the typed unit model, the explicit dependency graph, the integration between cgroups and service lifecycle, the structured journal -- solves a concrete problem that the old shell-script approach could not handle predictably at scale.
The practical takeaway from this article: write services with the correct Type= (prefer exec over simple, and notify-reload over notify when your daemon supports it; set NotifyAccess=all if a child process sends the notification), always pair Requires= with After=, use network-online.target rather than network.target for network-dependent services and verify a provider service is enabled to back it, make the journal persistent by setting Storage=persistent in journald.conf and size it appropriately, define ExecStop= for services that need ordered teardown, prefer drop-in overrides over editing vendor units, understand that daemon-reload does not restart running services, run systemd-analyze verify as part of your deployment pipeline, and run systemd-analyze security on every service you write to drive iterative hardening. Those habits will eliminate the majority of systemd-related production incidents.
The further layers -- socket activation, portable services, systemd-nspawn containers, and advanced systemd-oomd configuration -- build directly on the unit file primitives covered here. The man pages for systemd.unit(5), systemd.service(5), systemd.timer(5), systemd.resource-control(5), and systemd.exec(5) (which covers all sandboxing directives) are the authoritative references. They are genuinely well-written and worth reading in full once the mental model is in place.