NixOS in Production: Smooth the Path

Somewhere in a Hetzner data center in Falkenstein, Germany, a bare metal server with 64 GB of RAM and dual NVMe drives runs PostgreSQL with TimescaleDB, ships logs to Grafana Loki, and serves a customer-facing REST API -- all managed by a two-person team with zero dedicated infrastructure specialists. The entire production stack deploys with a single command: clan machines update. No SSH sessions. No runbooks. No documentation drift. The configuration is the documentation.

That server runs NixOS. And it is far from alone.

NixOS has been gaining traction as a production operating system, moving well beyond its origins as a research project and hobbyist curiosity. Teams ranging from fintech startups to enterprises running Kubernetes clusters on NixOS hypervisors have discovered that the purely functional software deployment model Eelco Dolstra described in his 2006 doctoral thesis at Utrecht University -- The Purely Functional Software Deployment Model -- translates into tangible operational advantages when extended from package management to full system configuration and applied to bare metal infrastructure. But the path is not without friction. NixOS demands a fundamentally different mental model from conventional Linux administration, and its sharp edges can draw blood if you are not prepared for them.

This article is a practitioner's guide to running NixOS on production bare metal. We will cover the architectural decisions that make it compelling for this use case, walk through real configuration patterns for services and secrets, examine the rollback mechanism that makes deployments fearless, and catalog the specific pain points you will encounter -- along with strategies for mitigating each one.

Why Bare Metal, and Why NixOS

The economics of bare metal are increasingly hard to ignore. Numtide, the consultancy behind the nixos-anywhere deployment tool, noted in their 2023 blog post that they have observed a clear trend of organizations migrating from virtual machines in cloud providers like AWS and GCP to bare metal servers at providers like Hetzner and OVH. The increased resources available at a fraction of the cloud price are very attractive, particularly for resource-intensive applications such as machine learning and data processing.

The px dynamics team, a trading signal platform consisting of a data engineer and a quant trader, reported in February 2026 that their entire production infrastructure -- PostgreSQL with TimescaleDB, a customer-facing FastAPI application, a full Prometheus/Loki/Grafana observability stack, scheduled Prefect workflows, and offsite backups -- runs on three Hetzner servers for under 100 euros per month. Their database server alone, they noted, would cost several hundred dollars per month on a managed cloud service.

But bare metal comes with a cost: you own the entire operating system stack. No managed Kubernetes, no click-to-deploy RDS, no auto-scaling groups. This is precisely where NixOS changes the calculus. The NixOS Wiki's bare metal deployment FAQ explains the core proposition: where non-Nix bare metal deployments are typically maintained with converging configuration managers like Puppet or Chef, the NixOS module system replaces these tools entirely, making the installation and configuration of your deployment fully declarative.

The advantages compound specifically because the failure modes on bare metal are more expensive. When a cloud VM misbehaves, you terminate it and spin up another. When a bare metal server's configuration drifts into an unknown state at 2 AM, you are looking at a potential IPMI console session and hours of manual recovery -- unless your operating system can atomically revert to a known-good state in under a minute.

The Nix Store and Atomic Generations

Understanding NixOS in production requires understanding the Nix store, the foundational mechanism that enables everything else. Every package, configuration file, and system artifact is stored under /nix/store in a path that includes a cryptographic hash derived from all inputs used to build it. As the NixOS Wikipedia entry explains, packages in the store are identified by a cryptographic hash of all input used for their build, and the system uses this mechanism to manage configuration files, ensuring that newer configurations do not overwrite older ones.

This is not a theoretical nicety. It has a direct operational consequence: every time you run nixos-rebuild switch, NixOS builds an entirely new system generation. The previous generation remains intact in the store. The bootloader is updated to offer both the current and all previous generations as boot options. Switching between them is instantaneous.

listing system generations

$ nixos-rebuild list-generations
Generation 147  current     2026-01-28 14:32:01
Generation 146                2026-01-25 09:18:44
Generation 145                2026-01-20 16:05:12
Generation 144                2026-01-15 11:41:33

The foundational NixOS paper by Dolstra, Löh, and Pierron, first presented at ICFP 2008 and later published in the Journal of Functional Programming (2010), described this design: the system is updated to a new configuration by changing the specification and rebuilding the system from that specification, allowing a system to be built deterministically and allowing the user to roll back the system to previous configurations since these are not overwritten.

In practice, rolling back is a single command:

$ sudo nixos-rebuild switch --rollback

That reverts the entire system configuration -- kernel modules, services, configuration files, installed packages -- to the exact state of the previous generation. User data in directories like /home and /var remains untouched. Sylvain Pasche, writing about deploying reproducible Kubernetes infrastructure with NixOS and OKD at ELCA, described this capability as invaluable in production environments: a problematic update that might require hours of debugging on traditional systems becomes a brief fix on NixOS.

The Flake: Pinning Your Entire Universe

If you are deploying NixOS to production in 2026, you are almost certainly using Flakes. While technically still labeled experimental, Flakes have become the de facto standard for production NixOS configurations. They solve a critical reproducibility gap that existed with the older channel-based workflow.

A Flake is defined by a flake.nix file that explicitly declares all inputs -- the exact revision of Nixpkgs, any community modules, deployment tools -- and a flake.lock file that pins the precise version of every dependency. Pasche described Flakes as providing explicit dependency pinning through a lock file mechanism, guaranteeing that every hypervisor built from his team's configuration would be identical regardless of when it was deployed.

A minimal production Flake for a bare metal server looks like this:

flake.nix

{
  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/nixos-25.11";
    disko = {
      url = "github:nix-community/disko";
      inputs.nixpkgs.follows = "nixpkgs";
    };
    sops-nix = {
      url = "github:Mic92/sops-nix";
      inputs.nixpkgs.follows = "nixpkgs";
    };
  };

  outputs = { self, nixpkgs, disko, sops-nix, ... }: {
    nixosConfigurations.production-db = nixpkgs.lib.nixosSystem {
      system = "x86_64-linux";
      modules = [
        disko.nixosModules.disko
        sops-nix.nixosModules.sops
        ./hosts/production-db/configuration.nix
        ./hosts/production-db/hardware-configuration.nix
        ./hosts/production-db/disk-config.nix
      ];
    };
  };
}

The inputs.nixpkgs.follows pattern is critical. It ensures that disko and sops-nix use the exact same Nixpkgs revision as your system, preventing version conflicts between modules. The lock file records the precise Git commit of each input. Updating dependencies is an explicit, auditable action: nix flake update followed by a review of the lock file diff.

Pro Tip

Store your flake.lock in version control and treat updates to it like any other code change -- review the diff, test in staging, deploy deliberately. The lock file is the single artifact that guarantees reproducibility across time.

Disk Partitioning as Code with Disko

One of the first decisions you face on bare metal is disk layout. The disko module from the nix-community project lets you declare partitioning, formatting, and filesystem configuration in Nix, making even the lowest-level hardware configuration reproducible. Combined with nixos-anywhere, you can provision a bare metal server from scratch over SSH without ever touching a physical console.

Here is a production-oriented disk configuration using ZFS with mirrored NVMe drives:

disk-config.nix -- ZFS mirror with selective snapshots

{
  disko.devices = {
    disk.nvme0 = {
      type = "disk";
      device = "/dev/nvme0n1";
      content = {
        type = "gpt";
        partitions = {
          ESP = {
            size = "512M";
            type = "EF00";
            content = {
              type = "filesystem";
              format = "vfat";
              mountpoint = "/boot";
            };
          };
          zfs = {
            size = "100%";
            content = { type = "zfs"; pool = "tank"; };
          };
        };
      };
    };
    # nvme1 mirrors nvme0's zfs partition

    zpool.tank = {
      type = "zpool";
      mode = "mirror";
      datasets = {
        root = { type = "zfs_fs"; mountpoint = "/"; };
        nix = {
          type = "zfs_fs";
          mountpoint = "/nix";
          # No snapshots -- Nix store is fully reproducible
        };
        postgresql = {
          type = "zfs_fs";
          mountpoint = "/var/lib/postgresql";
          options."com.sun:auto-snapshot" = "true";
        };
      };
    };
  };
}

The key insight is selective snapshotting. The /nix store does not need snapshots because it is fully reproducible from the configuration -- you can rebuild it from scratch at any time. Your database data, on the other hand, is the crown jewel that absolutely must be snapshotted and backed up. The px dynamics team uses this exact pattern, noting that this distinction keeps snapshot overhead minimal while protecting what actually matters.

Warning

ZFS on Linux carries an unresolved licensing tension: the CDDL license under which ZFS is released is considered incompatible with the GPL that governs the Linux kernel. NixOS handles this by building the ZFS kernel module out-of-tree, which means ZFS module updates can lag behind kernel releases. Before adopting ZFS in production, verify that your target kernel version is supported by checking the zfs.latestCompatibleLinuxPackages option, and be prepared to pin your kernel version if necessary.

Bootstrapping with nixos-anywhere

The nixos-anywhere tool is what makes NixOS practical for bare metal at scale. It connects to any Linux host via SSH, uses the kexec system call to boot into a NixOS installer image in memory, partitions the disks using your disko configuration, and installs your complete NixOS system -- all without requiring physical console access or a bootable USB drive.

provisioning a bare metal server from scratch

# From your workstation, targeting a fresh Hetzner rescue system:
$ nix run github:nix-community/nixos-anywhere -- \
    --flake .#production-db \
    root@203.0.113.10

The target machine must have at least 1 GB of RAM (excluding swap) and must be reachable over the network. The nixos-anywhere documentation notes that the tool works equally well for cloud servers, bare metal servers at providers like Hetzner, and local servers accessible via a LAN. Erethon, a NixOS practitioner who documented their workflows in February 2025, described the experience as something that in theory could work but when you actually see it take over a system over SSH and install a fully configured NixOS system, it feels like magic.

Note

Eric Cheng, who maintains a detailed NixOS homelab writeup, reports being able to provision a new server from scratch in roughly 10 minutes from the installer ISO. For remote bare metal where you cannot boot an ISO, nixos-anywhere achieves similar results over SSH using kexec.

Service Configuration: The Module System

The NixOS module system is where the declarative model truly pays dividends. Services, networking, users, firewall rules, kernel parameters -- everything is declared in Nix and evaluated together into a coherent system configuration. The Nixpkgs repository, which contains over 120,000 packages with thousands of contributors per release cycle, provides the foundation.

Here is a production PostgreSQL configuration with performance tuning:

postgresql.nix

{ pkgs, ... }:
{
  services.postgresql = {
    enable = true;
    package = pkgs.postgresql_17;

    extensions = ps: [ ps.timescaledb ps.pg_stat_statements ];
    settings = {
      shared_preload_libraries = "timescaledb,pg_stat_statements";

      # PGTune: mixed workload, 64GB RAM, 16 CPUs, SSD
      max_connections = 200;
      shared_buffers = "16GB";        # 25% of RAM
      effective_cache_size = "48GB";  # 75% of RAM
      work_mem = "27962kB";
      maintenance_work_mem = "2GB";
      random_page_cost = 1.1;         # SSD: nearly sequential
      effective_io_concurrency = 200;
      min_wal_size = "4GB";
      max_wal_size = "16GB";

      # Listen only on WireGuard VPN interface
      listen_addresses = "fd00::1";
    };
  };

  # Firewall: PostgreSQL only on VPN
  networking.firewall.interfaces.wg0.allowedTCPPorts = [ 5432 ];
}

Notice what this achieves in a few dozen lines: it installs PostgreSQL 17 with TimescaleDB and pg_stat_statements extensions, applies PGTune-recommended performance settings with inline documentation explaining the rationale, binds the service exclusively to the WireGuard VPN interface, and opens the firewall port only on that interface. The comments explain why, not what -- the configuration itself is the what. The px dynamics team emphasized this point: the configuration is the documentation, and there is no wondering whether the production database settings match what is in the wiki.

Security Hardening

NixOS provides a rich set of options for hardening services beyond basic firewall rules. The module system makes it straightforward to apply systemd sandboxing directives declaratively, ensuring that every service runs with the minimum privileges it needs. The NixOS option security.lockKernelModules prevents loading new kernel modules after boot, security.protectKernelImage disables kexec and writing to /dev/kmem, and the hardened kernel profile (boot.kernelPackages = pkgs.linuxPackages_hardened) enables a battery of kernel-level mitigations.

For individual services, systemd sandboxing options are exposed through the NixOS module system:

systemd service hardening

systemd.services.api-service.serviceConfig = {
  DynamicUser = true;
  ProtectSystem = "strict";
  ProtectHome = true;
  PrivateTmp = true;
  PrivateDevices = true;
  NoNewPrivileges = true;
  ProtectKernelTunables = true;
  ProtectControlGroups = true;
  RestrictAddressFamilies = [ "AF_INET" "AF_INET6" "AF_UNIX" ];
  RestrictNamespaces = true;
  LockPersonality = true;
  MemoryDenyWriteExecute = true;
};

The advantage of declaring these constraints in Nix is the same as everything else: they are versioned, reviewable, and consistent across deployments. You can also audit your entire fleet's hardening posture by reading the configuration rather than inspecting running systems.

Secrets Management: The Nix Store Problem

The Nix store is world-readable. Every file under /nix/store can be read by any user on the system. This is a fundamental design property -- it enables sharing and caching -- but it means that plaintext secrets must never end up in the store. The NixOS Wiki on secret management states this directly: all files in the Nix store are readable by any system user, so it is not a suitable place for including cleartext secrets.

The community has converged on two primary solutions: sops-nix (by Mic92) and agenix (by ryantm). Both encrypt secrets at rest using age or GPG keys, store the encrypted files in your Git repository alongside the NixOS configuration, and decrypt them on the target machine at activation time using SSH host keys.

Michael Stapelberg, a well-known open source contributor, documented his migration to sops-nix in August 2025 and described choosing it because the setup instructions made sense to him and he wanted the option to use encryption backends beyond age alone. The sops-nix project supports SOPS with age, GPG, AWS KMS, GCP KMS, Azure Key Vault, and HashiCorp Vault.

secrets.nix -- sops-nix configuration

{ config, ... }:
{
  sops = {
    defaultSopsFile = ./secrets/production.yaml;
    age.sshKeyPaths = [ "/etc/ssh/ssh_host_ed25519_key" ];

    secrets."postgresql/replication-password" = {
      owner = "postgres";
      group = "postgres";
      mode = "0400";
    };

    secrets."api/jwt-signing-key" = {
      owner = "api-service";
      mode = "0400";
      restartUnits = [ "api-service.service" ];
    };
  };

  # Reference the decrypted secret in service config
  systemd.services.api-service.serviceConfig = {
    LoadCredential = [
      "jwt-key:${config.sops.secrets."api/jwt-signing-key".path}"
    ];
  };
}

Decrypted secrets are placed in /run/secrets/ (a tmpfs mount) with the specified ownership and permissions. They exist only in memory at runtime. The restartUnits option automatically restarts the relevant service when a secret changes during activation, ensuring the running process always uses current credentials.

Observability as Code

The same declarative model extends naturally to monitoring and observability. The px dynamics team runs a full Prometheus, Grafana Loki, and Grafana stack declared entirely in Nix. Here is a condensed example of how to wire up Prometheus with node exporter and PostgreSQL metrics:

observability.nix

{ config, ... }:
{
  services.prometheus = {
    enable = true;
    scrapeConfigs = [
      {
        job_name = "node";
        static_configs = [{
          targets = [ "localhost:9100" ];
        }];
      }
      {
        job_name = "postgresql";
        static_configs = [{
          targets = [ "localhost:9187" ];
        }];
      }
    ];
  };

  services.prometheus.exporters.node = {
    enable = true;
    enabledCollectors = [ "systemd" "zfs" ];
  };

  services.grafana = {
    enable = true;
    settings.server.http_addr = "fd00::1"; # WireGuard only
  };

  services.loki = {
    enable = true;
    configuration = {
      auth_enabled = false;
      # ... storage and schema config
    };
  };
}

The same principle applies: the observability stack's configuration lives in Git alongside the services it monitors. If you add a new service, you add its scrape target in the same pull request. There is no drift between what is running and what is monitored.

Deployment Strategies

There are several approaches to deploying NixOS configurations to remote machines, each with different tradeoffs.

nixos-rebuild with --target-host is the simplest option and the one that Erethon described as the approach they use. It builds the system closure locally and copies it to the target over SSH:

remote deployment

$ nixos-rebuild switch \
    --flake .#production-db \
    --target-host root@production-db.internal \
    --build-host localhost

Colmena is a stateless deployment tool written in Rust, modeled after NixOps and morph. It adds parallel deployments across multiple hosts, tag-based host filtering, and a REPL for interactively querying node configurations. deploy-rs offers similar capabilities with a focus on safe activation and automatic rollback if the new configuration fails health checks.

Clan is the newest entrant, used by px dynamics for their trading infrastructure. It provides opinions about fleet structure, handles inventory management, secrets distribution via SOPS with age encryption, and WireGuard mesh networking as an integrated layer on top of NixOS. Their entire fleet is defined in a single flake.nix, and adding a new server means adding it to the inventory and running clan machines update.

Warning

Gabriella Gonzalez, creator of the Dhall configuration language and author of the NixOS in Production handbook (Leanpub, 2023), documented a critical lesson in a widely cited 2018 blog post drawing on her years leading a Haskell and Nix DevOps team: the workflow recommended in the NixOS manual -- editing /etc/nixos/configuration.nix directly on the target and running nixos-rebuild switch locally -- is not suitable for production. Production deployments should build closures on a separate machine, store configuration in version control, and deploy the binary closure to the target.

Backup Strategies

On bare metal, you own your backup strategy. NixOS provides well-maintained modules for the two leading deduplicating backup tools: borgbackup and restic. Both support encrypted, incremental offsite backups and integrate cleanly with systemd timers. The px dynamics team uses offsite backups as a critical component of their production infrastructure.

backups.nix -- restic to S3-compatible storage

{ config, ... }:
{
  services.restic.backups.postgresql-offsite = {
    initialize = true;
    repository = "s3:https://s3.eu-central-1.amazonaws.com/backups-prod";
    passwordFile = config.sops.secrets."backup/restic-password".path;
    environmentFile = config.sops.secrets."backup/s3-credentials".path;
    paths = [ "/var/lib/postgresql" ];
    timerConfig = {
      OnCalendar = "daily";
      Persistent = true;
    };
    pruneOpts = [
      "--keep-daily 7"
      "--keep-weekly 4"
      "--keep-monthly 6"
    ];
  };
}

Note how the backup configuration references sops-nix secrets for credentials, demonstrating how the different NixOS subsystems compose. The key insight for bare metal is that your operating system configuration is reproducible from Git, so you do not need to back up /nix. Focus backup resources on stateful data: databases, user uploads, and application state in /var.

The NixOS VM Test Framework

One of NixOS's underappreciated capabilities is its built-in integration testing framework. You can define a test that spins up one or more QEMU virtual machines, configures services on them using your actual NixOS modules, and runs assertions against the running system. Erethon described these tests as one of the killer features of nixpkgs and NixOS, noting that they allow you to spawn QEMU VMs, configure services, and then run scripts that test those services against various conditions.

integration test for PostgreSQL replication

import ./make-test-python.nix ({ pkgs, ... }: {
  name = "postgresql-replication";

  nodes = {
    primary = { ... }: {
      services.postgresql = {
        enable = true;
        settings.wal_level = "replica";
      };
    };
    replica = { ... }: {
      # streaming replication config
    };
  };

  testScript = ''
    primary.wait_for_unit("postgresql.service")
    primary.succeed("sudo -u postgres psql -c 'SELECT 1'")
    replica.wait_for_unit("postgresql.service")
    replica.succeed(
      "sudo -u postgres psql -c "
      "'SELECT pg_last_wal_replay_lsn() IS NOT NULL'"
    )
  '';
})

This test runs entirely in your CI pipeline, without requiring production access or external infrastructure. You can validate firewall rules, verify services bind to the correct interfaces, and test failover scenarios -- all before the configuration reaches production.

Remote Builders and Binary Caching

Building NixOS closures can be resource-intensive, particularly for large configurations with custom packages. Nix has built-in support for distributed builds: you can configure remote builder machines that accept build jobs over SSH, allowing your workstation or CI system to offload compilation to more powerful hardware.

remote builder configuration

{
  nix.buildMachines = [{
    hostName = "builder.internal";
    systems = [ "x86_64-linux" ];
    maxJobs = 16;
    speedFactor = 2;
    supportedFeatures = [ "nixos-test" "big-parallel" "kvm" ];
  }];
  nix.distributedBuilds = true;
}

For teams that want to avoid rebuilding the same derivations across machines and CI runs, a binary cache is essential. The official cache.nixos.org covers packages from Nixpkgs, but custom packages or configurations with overlays will not be cached there. Cachix provides a hosted binary cache service that integrates with CI systems -- push build artifacts once, and every subsequent deployment or developer pulls prebuilt binaries instead of compiling from scratch. Self-hosted alternatives like attic and nix-serve-ng (the latter created by Gonzalez) are also available for teams that prefer to keep caches on their own infrastructure.

The Sharp Edges

NixOS's strengths come at a cost. Here are the specific pain points you should prepare for, drawn from community experience reports and production deployments.

The Learning Curve Is Not a Gentle Slope

The Nix language is a pure, lazy, dynamically typed functional language. If your team's experience is primarily with imperative tools like Ansible or shell scripts, the conceptual shift is substantial. The px dynamics team was candid about this: the first weeks involved a lot of questioning about why things were not working and what specific lines of code did. A NixOS Discourse user who had been deep in Nix for over a year described struggling with the build system, refactoring their flake multiple times, and getting fed up with static library issues before reaching proficiency.

The documentation has improved significantly, with the official nix.dev site, the Nixpkgs manual, and community resources like the NixOS and Flakes Book filling gaps. But budget two to four weeks for a team to become productive, and longer for complex production configurations.

The Six-Month Release Cycle

NixOS releases a new stable version every six months (the latest stable release is 25.11 "Xantusia," which shipped in November 2025; the prior release, 25.05 "Warbler," saw 2,857 contributors). A NixOS Discourse thread on corporate adoption identified this as a major pain point: many companies cannot handle upgrading every six months. NixOS does not offer a long-term support channel comparable to Ubuntu's five-year LTS releases or RHEL's decade-long support.

The mitigation is to pin to a stable release and selectively backport security-critical packages. Flakes make this tractable -- you can override individual packages from a newer Nixpkgs revision while keeping the rest on stable:

selective package override from unstable

{ pkgs, pkgs-unstable, ... }:
{
  # Pin OpenSSL to latest from unstable for CVE fixes
  nixpkgs.overlays = [
    (final: prev: {
      openssl = pkgs-unstable.openssl;
    })
  ];
}

When the time comes to perform a full version upgrade (for instance, from 25.05 to 25.11), the process follows a predictable pattern: update the nixpkgs.url in your flake.nix to the new release branch, run nix flake update, review the lock file diff, then build and test in a staging environment before deploying to production. The NixOS release notes document all breaking changes, deprecated modules, and required migration steps. Common friction points include renamed or removed NixOS options, services that changed their default configuration structure, and major version bumps in databases or runtimes that require data migration. The atomic generation model means that if an upgrade causes problems in production, you can immediately roll back while you address the issue -- but testing in a VM or staging server first remains the safest approach.

Reproducibility: The Fine Print

NixOS is often described as providing "reproducible builds," but the reality is more nuanced. Morten Linderud (known as Foxboron), an Arch Linux developer and long-time contributor to the Reproducible Builds effort, wrote a detailed April 2024 blog post clarifying that neither Nix nor NixOS guarantees bit-for-bit reproducible builds in the strict sense used by the Reproducible Builds project. The Nix store hash is computed from the inputs (the derivation), not the outputs (the build artifacts). Nondeterminism in build processes -- timestamps, random seeds, parallel compilation order -- can and does cause bitwise differences between builds.

What NixOS does guarantee is deployment reproducibility: given the same flake.lock, you will get the same build plan and the same dependency graph. Combined with the official binary cache at cache.nixos.org, which provides prebuilt binaries for the vast majority of packages, you will in practice get the exact same binaries. The NixOS reproducible builds dashboard tracks ongoing work to improve bit-for-bit reproducibility across the package set.

Binary Compatibility and the FHS

NixOS does not follow the Filesystem Hierarchy Standard. There is no /usr/lib, no /usr/bin beyond a symlink for /bin/sh. Precompiled binaries that expect libraries at standard paths will not work without intervention. The NixOS project provides patchelf, a utility that can rewrite the dynamic linker and RPATH fields embedded in executables. The original NixOS paper documented this: the derivation that builds Adobe Reader uses patchelf to set the program's dynamic linker to a specific Nix store path and its RPATH to the store paths of required libraries.

For simpler cases, NixOS offers buildFHSEnv, which creates a lightweight FHS-compatible environment. But this remains one of the most frequently cited frustrations -- particularly with proprietary software, pre-built binaries from vendors, and AppImage packages.

Disk Space and Garbage Collection

The Nix store accumulates generations. Every deployment creates a new one, and the old ones remain until explicitly garbage collected. On a server with frequent deployments, the store can grow to consume tens of gigabytes. You will want a systemd timer that runs nix-collect-garbage on a schedule:

automated garbage collection

{
  nix.gc = {
    automatic = true;
    dates = "weekly";
    options = "--delete-older-than 30d";
  };

  # Keep the current and 5 previous generations
  nix.settings.keep-outputs = true;
}

Caution

Be careful with aggressive garbage collection on production systems. Always keep at least two or three previous generations as rollback targets. If you garbage collect the generation you need to roll back to, you will have to rebuild it from scratch -- which may require downloading packages from the binary cache over the network.

Real-World Production Patterns

Several patterns have emerged from teams running NixOS in production that are worth adopting.

The monorepo approach. px dynamics keeps their entire infrastructure -- NixOS configurations, database migrations, application code, workflow definitions -- in a single Git repository. A change to a database schema, the Python code that writes to it, and the infrastructure that hosts it can all be reviewed in one pull request. They noted that context is never lost because there is only one place to look.

WireGuard mesh for internal services. Bind databases, observability stacks, and internal APIs exclusively to a WireGuard VPN interface. The public internet should see only SSH and the WireGuard endpoint. The px dynamics team reported that WireGuard is fast, simple, and they genuinely forget it is running.

No Docker overhead. When your OS already provides isolation and reproducibility through the Nix store, containers add complexity without proportional benefit for many workloads. Services run directly on the host, managed by systemd. The px dynamics team described this as: rock-solid tools like systemd handle process supervision, logging, and resource limits and come with virtually zero overhead.

NixOS VM tests in CI. Run your NixOS integration tests as part of every pull request. They catch configuration regressions -- firewall rules that block traffic, services binding to the wrong interface, missing dependencies -- before code reaches production.

Wrapping Up

NixOS on bare metal is not a choice you make because it is easy. You make it because the problems it solves -- configuration drift, deployment inconsistency, catastrophic rollback scenarios, undocumented infrastructure -- are expensive problems, and NixOS eliminates them structurally rather than procedurally. The px dynamics team captured this well: once the foundation was in place, they were able to expand and modify it quickly on their own, noting that the investment in learning Nix pays forward.

The learning curve is real. The documentation, while improving, still has gaps. The six-month release cycle is a genuine concern for enterprises accustomed to decade-long support windows. And the FHS incompatibility will frustrate you with certain vendor software.

But the core proposition holds: your production infrastructure, declared in a Git repository, reproducible across machines and time, atomically deployable, and instantly rollback-able. For teams willing to invest in learning the Nix model, that proposition is increasingly difficult to refuse.

Once you internalize the model, infrastructure becomes genuinely tractable. Problems have answers you can find by reading code.

-- px dynamics, "Everything in Git: Running a Trading Signal Platform on NixOS"

^ back to top