Configuring nftables with Docker

Kandi Brian

For years, Docker relied exclusively on iptables to manage firewall rules for container networking. When you published a port or created a bridge network, Docker inserted NAT and filter rules through the iptables binary -- which, on modern distributions, was quietly translating those rules into the nftables kernel subsystem through the iptables-nft compatibility shim. That indirection layer worked, but it created friction: flushing your nftables ruleset wiped Docker's translated rules, restarting the nftables service broke container connectivity, and managing custom firewall rules alongside Docker's auto-generated chains required careful choreography.

Docker Engine 29 changed this by introducing experimental support for creating nftables rules directly, without the iptables translation layer. The daemon now writes its own nftables tables and chains, giving administrators a cleaner integration path on distributions that have moved past iptables entirely. As Docker's engineering team noted in their Engine 29 release announcement, nftables will eventually become the default backend and iptables support will be deprecated. This guide covers the full configuration workflow, from prerequisites through production-hardened rule sets.

Why the Shift from iptables to nftables

Every major Linux distribution has adopted nftables as the default firewall framework. Debian 10+, Ubuntu 20.04+, RHEL 8+, and Fedora all ship with nftables enabled out of the box. According to firewall adoption analysis current to 2026, nftables has been the default backend in these distributions since 2019--2020, and Kubernetes production deployments have driven accelerating adoption in cloud-native environments. The reasons are straightforward: nftables offers atomic rule updates (entire rulesets can be replaced in a single kernel operation), a unified interface for IPv4, IPv6, ARP, and bridging rules under a single nft command, native support for sets and maps without external tools like ipset, and significantly better performance with large rulesets.

The iptables compatibility layer (iptables-nft) bridged the gap by translating legacy iptables commands into nftables rules behind the scenes. You can check which backend your system uses by looking at the version output:

$ iptables --version

If the output includes (nf_tables), your system is already translating iptables calls into nftables. If it says (legacy), you are running the older iptables kernel module. Docker's native nftables backend bypasses this translation entirely, creating rules in dedicated nftables tables that the daemon owns and manages.

Note

Docker's nftables support is marked experimental as of Engine 29. Configuration options, behavior, and the structure of auto-generated rules may change in future releases. Overlay network rules used by Docker Swarm have not yet been migrated, so the nftables backend cannot be used with Swarm mode.

"nftables will become the default firewall backend and iptables support will be deprecated."
-- Docker Engineering, Docker Engine v29: Foundational Updates for the Future. The team also noted that Swarm support and efficiency improvements around port sets are planned before that transition completes.

Prerequisites

Before enabling the nftables backend, three conditions need to be in place: Docker Engine 29 or later must be installed, the nft user-space utility must be available, and IP forwarding must be enabled on the host.

The IP forwarding requirement is a critical behavioral difference from the iptables backend. When Docker runs with iptables, it silently enables net.ipv4.ip_forward and net.ipv6.conf.all.forwarding at daemon startup, and then sets the default forwarding policy to drop so that the host does not act as an open router. With the nftables backend, Docker will not enable IP forwarding itself. Instead, it will report an error if forwarding is needed but not active. You must enable it explicitly:

enable IP forwarding

# Enable IPv4 forwarding immediately
$ sudo sysctl -w net.ipv4.ip_forward=1

# Enable IPv6 forwarding immediately
$ sudo sysctl -w net.ipv6.conf.all.forwarding=1

# Make persistent across reboots
$ cat <<EOF | sudo tee /etc/sysctl.d/99-docker-forwarding.conf
net.ipv4.ip_forward = 1
net.ipv6.conf.all.forwarding = 1
EOF

Caution

Enabling IP forwarding on a host with multiple network interfaces turns the machine into a router. Without firewall rules to restrict it, traffic could be forwarded between interfaces in ways you do not intend. On a multi-homed host, add nftables rules to block unwanted forwarding between non-Docker interfaces before enabling forwarding. With the iptables backend, Docker automatically set a DROP policy on forwarded traffic to prevent the host from acting as an open router; with nftables, you are responsible for equivalent rules. If you prefer Docker not to set that drop policy even when using iptables, you can use "ip-forward-no-drop": true in daemon.json.

Note

If you need Docker to start without forwarding enabled -- for testing, or because forwarding is managed by an external system -- you can suppress Docker's forwarding check with "ip-forward": false in daemon.json, or by passing --ip-forward=false on the command line. Docker will then start and create networks even when it detects that forwarding is disabled. In production, this option should not be used as a substitute for properly enabling forwarding before the daemon starts.

Also verify that no conflicting firewall framework is active. If you are using ufw, disable it before proceeding, as it manages iptables rules that can conflict with Docker's nftables tables:

$ sudo ufw disable

If you are running firewalld, you do not need to disable it. Docker's nftables backend is designed to coexist with firewalld. Docker still creates a firewalld zone called docker with an ACCEPT target and inserts its bridge interfaces into it, and it creates a forwarding policy called docker-forwarding. The key difference from the iptables backend is that Docker now creates its nftables rules directly, rather than through firewalld's deprecated direct interface. Your existing firewalld zones and policies remain in effect alongside Docker's tables. That said, be aware that firewalld may also generate nftables rules, and you should verify there are no priority or policy conflicts with your custom tables after enabling the Docker nftables backend.

Ensure the nftables service is enabled and running:

$ sudo systemctl enable --now nftables

Enabling the nftables Firewall Backend

The nftables backend is enabled through Docker's daemon configuration. Edit or create /etc/docker/daemon.json:

/etc/docker/daemon.json

{
  "firewall-backend": "nftables"
}

Restart the Docker daemon to apply the change:

$ sudo systemctl restart docker

Alternatively, you can pass the option directly on the command line for testing without modifying the configuration file:

$ sudo dockerd --firewall-backend=nftables

After restarting, verify that Docker has created its nftables tables:

verify nftables tables

$ sudo nft list tables
table ip docker-bridges
table ip6 docker-bridges

If you see ip docker-bridges and ip6 docker-bridges in the output, Docker is creating native nftables rules. Each bridge network you create will add further chains and rules within these tables.

Warning

If Docker was previously running with the iptables backend, switching to nftables will cause Docker to delete its iptables chains and rules on restart and replace them with nftables equivalents. If you had custom rules in the iptables DOCKER-USER chain, they will no longer take effect. See the migration section below.

How Docker's nftables Tables Work

Understanding the table structure Docker creates is essential for writing rules that coexist with it. For bridge networks, Docker creates two tables: ip docker-bridges (IPv4) and ip6 docker-bridges (IPv6). Each table contains a set of base chains, and additional chains are created for each bridge network.

Docker considers these tables its own property. The rules within them change as you create and destroy networks, start and stop containers, and publish or unpublish ports. You should never modify Docker's tables directly -- changes you make will likely be overwritten the next time Docker reconciles its state.

To inspect what Docker has created:

inspecting Docker's nftables rules

# List all rules in Docker's IPv4 bridge table
$ sudo nft list table ip docker-bridges

# List only the chain names and their hooks/priorities
$ sudo nft list chains ip docker-bridges

Docker's base chains attach to specific Netfilter hooks (prerouting, forward, postrouting) with well-known priority values. According to Docker's documentation, Docker uses well-known priority values for each of its base chains, and you can set your chain priority relative to Docker's. In practice, Docker's forward chain has been observed to use priority -100 in the current implementation. However, the official documentation explicitly states that the internal structure of Docker's tables is subject to change between releases during the experimental period. To be safe, treat the specific numeric values as observed behavior rather than a published contract, and verify with sudo nft list chains ip docker-bridges on your own host. This means Docker's forwarding rules evaluate before any chain you create at the default filter priority of 0.

Pro Tip

For DNS resolution, Docker also creates nftables rules inside the container's own network namespace, not just on the host. These per-container rules handle the internal DNS forwarding that allows containers to resolve other container names.

Understanding Chain Priority

Chain priority is the single concept that trips up nearly every administrator configuring nftables alongside Docker. In nftables, when multiple chains attach to the same hook (for example, the forward hook), the kernel evaluates them in order of ascending priority number. A chain with priority -200 runs before a chain with priority -100, which runs before a chain with priority 0.

There is a critical and widely misunderstood detail about how nftables verdicts work across multiple chains. Unlike iptables, an accept verdict in nftables is not final at the hook level. According to Docker's own documentation: when a packet is accepted in one base chain, it still traverses all other base chains attached to the same hook. A packet is only truly accepted if it passes through every base chain without being dropped. This means Docker accepting a packet in its forward chain does not prevent a chain you control from dropping it. The real problem runs in the opposite direction: if Docker's chain issues a drop verdict for a packet, that drop is final. No subsequent chain can override it. The nftables wiki confirms this asymmetry: drops take immediate effect with no further rules or chains evaluated, while accept verdicts only guarantee termination of that specific base chain's processing.

The nft(8) man page makes this even more explicit: a drop verdict immediately ends evaluation of the entire ruleset, and no further chains of any hook are consulted. It is therefore not possible to have a drop verdict changed to an accept in a later chain. Conversely, an accept verdict ends evaluation of the current base chain, but the packet advances to the next base chain -- meaning a packet is accepted if and only if no matching rule or base chain policy issues a drop.

"an accept verdict... isn't necessarily final."
-- nftables wiki, Configuring chains. The wiki goes on to clarify that when a packet is accepted in one base chain, any later-priority chain attached to the same hook still evaluates it. A drop verdict, by contrast, is final -- no subsequent chain gets a chance to override it.

This distinction matters for firewall design. A drop rule in a Docker chain will kill the packet regardless of what your custom chains do. But if you want to block traffic that Docker would otherwise allow, you can do so with a custom chain at any priority -- because Docker's accept does not shield the packet from your drop rules.

To use chain priority as an extra guarantee that your rules run first, create your chains with a priority lower than Docker's:

"an 'accept' rule is not final. It terminates processing for its base chain."
-- Docker Engine documentation, Docker with nftables. The documentation continues to explain that the accepted packet is still processed by any other base chains that share the same hook, which may drop it. This is a deliberate design choice: nftables allows each table to own its chains independently.

custom chain with higher priority than Docker

# Create a table for your custom rules
$ sudo nft add table inet firewall

# Add a forward chain with priority -200 (runs before Docker's -100)
$ sudo nft add chain inet firewall forward_filter \
  '{ type filter hook forward priority -200; policy accept; }'

# Block external access to a specific container's internal port
# Note: tcp dport matches the post-DNAT port (container's port).
# To match on the published host port instead, use ct original proto-dst.
$ sudo nft add rule inet firewall forward_filter \
  tcp dport 3000 drop

# Or restrict to a specific container IP
$ sudo nft add rule inet firewall forward_filter \
  ip daddr 172.17.0.5 tcp dport 3000 drop

The inet family handles both IPv4 and IPv6 in a single table, which simplifies management compared to maintaining separate ip and ip6 tables.

Migrating from the DOCKER-USER Chain

With the iptables backend, Docker provided the DOCKER-USER chain as an insertion point for custom rules in the filter table's FORWARD chain. Rules in DOCKER-USER ran before Docker's own forwarding rules, giving administrators a place to restrict traffic without editing Docker's chains directly.

The nftables backend has no DOCKER-USER chain. The replacement approach uses the table and chain separation that nftables provides natively: create your own table, add base chains that attach to the same hooks as Docker's chains, and use priority to control the evaluation order.

If you had rules in DOCKER-USER that restricted access to published ports, you need to migrate them. Here is an example: suppose you had an iptables rule that only allowed your office IP range to reach port 8080:

old iptables DOCKER-USER rule

# Old iptables approach (no longer works with nftables backend)
$ iptables -I DOCKER-USER -i eth0 -p tcp --dport 8080 \
  ! -s 203.0.113.0/24 -j DROP

The nftables equivalent uses a custom table with an appropriately prioritized chain:

nftables equivalent

# Create a dedicated table for Docker access control
$ sudo nft add table inet docker-access

# Forward chain that evaluates before Docker's chains
$ sudo nft add chain inet docker-access forward \
  '{ type filter hook forward priority -200; policy accept; }'

# Only allow office subnet to reach published host port 8080
# Use ct original proto-dst because DNAT has already rewritten the port
$ sudo nft add rule inet docker-access forward \
  iifname "eth0" ip daddr 172.17.0.0/12 ct original proto-dst 8080 \
  ip saddr != 203.0.113.0/24 drop

Using Firewall Marks to Override Docker's Drop Rules

Docker also supports a --bridge-accept-fwmark daemon option, which addresses a specific scenario: overriding Docker's own drop rules for bridged traffic. Remember that unlike accept, a drop verdict in Docker's chains is final -- no other chain can reverse it. The --bridge-accept-fwmark option provides an escape hatch. When configured, Docker will accept any forwarded packet carrying the specified firewall mark, regardless of its other drop rules. You mark acceptable packets in your own chain before Docker's rules evaluate, and Docker lets them through.

This is distinct from simply blocking traffic Docker would allow. That you can do with any custom chain at any priority. Firewall marks are needed specifically when you want to permit traffic that Docker's rules would otherwise drop.

The --bridge-accept-fwmark option also supports a bitmask, specified as mark/mask. For example, --bridge-accept-fwmark=0x1/0x3 tells Docker to accept packets where the lower two bits of the mark include bit 1. This is useful when other software on the host also sets firewall marks, allowing you to reserve specific bits for Docker's use without collisions.

firewall mark approach

# In daemon.json, add:
# "bridge-accept-fwmark": "0x1"
# Or with a mask to avoid conflicts with other mark users:
# "bridge-accept-fwmark": "0x1/0x3"

# Then in your nftables rules, mark packets you want Docker to accept
# The mark must be set in a chain with priority filter - 1 or lower
$ sudo nft add table inet my-marks
$ sudo nft add chain inet my-marks mark_allowed \
  '{ type filter hook forward priority filter - 1; policy accept; }'

# Mark traffic from the trusted subnet
$ sudo nft add rule inet my-marks mark_allowed \
  ip saddr 203.0.113.0/24 meta mark set 0x1

Matching Traffic on Its Original Port Before DNAT

A subtle problem when writing forwarding rules around Docker's published ports: by the time a packet reaches your forward chain, Docker has already applied DNAT in the prerouting hook. The destination IP and port have been rewritten to the container's internal address. If you write a forwarding rule that matches on port 8080, it will not match traffic that was originally destined for a host port that maps to container port 8080 -- because the packet's destination port has already been translated.

The solution is to use connection tracking's ct original keyword, which exposes the packet's address and port before DNAT was applied:

matching pre-DNAT port in forward chain

# This does NOT work: destination port has already been rewritten by DNAT
# iifname "eth0" tcp dport 8080 accept  -- won't match what you expect

# This DOES work: match on the port before DNAT translation
# Traffic originally directed to host port 8080
iifname "eth0" ip daddr 172.17.0.0/12 ct original proto-dst 8080 accept

# Practical use: only allow office subnet to reach the pre-DNAT port
iifname "eth0" ip saddr != 203.0.113.0/24 \
  ip daddr 172.17.0.0/12 ct original proto-dst 8080 drop

This technique is particularly useful when you want to allow or block access based on the published host port rather than the container's internal port. The ct original proto-dst expression is available because nftables connection tracking records the original addresses before any NAT transformation takes place. This expression is documented in the nft(8) man page under the ct expression section, and Docker's own nftables documentation references this approach as the correct way to match pre-DNAT ports in forwarding rules.

Making Custom Rules Persistent

Rules added with nft add on the command line are lost when the system reboots. There are two approaches to persistence, and the right one depends on whether your rules need to coexist with Docker's dynamic tables.

Approach 1: Separate nftables Configuration File

Place your custom rules in /etc/nftables.conf or a file included from it. The key constraint: do not use flush ruleset at the top of your configuration file if Docker is running. Flushing the entire ruleset wipes Docker's tables, and you will lose container connectivity until Docker restarts and recreates them.

Instead, flush only your own tables:

/etc/nftables.conf

#!/usr/sbin/nft -f

# Ensure table exists before flushing (safe on first boot)
table inet docker-access {}

# Flush only our own table, not the entire ruleset
flush table inet docker-access

table inet docker-access {
  chain forward {
    type filter hook forward priority -200; policy accept;

    # Allow established and related connections
    ct state established,related accept

    # Drop invalid state packets
    ct state invalid drop

    # Restrict published host port 8080 to office subnet
    # Use ct original proto-dst because DNAT has already rewritten the port
    iifname "eth0" ip daddr 172.17.0.0/12 ct original proto-dst 8080 ip saddr != 203.0.113.0/24 drop

    # Restrict port 5432 (PostgreSQL) -- block external access entirely
    # Use ct original proto-dst to match on the published host port
    iifname "eth0" ip daddr 172.17.0.0/12 ct original proto-dst 5432 drop
  }

  chain input {
    type filter hook input priority filter; policy drop;

    # Loopback
    iif lo accept

    # Established connections
    ct state established,related accept

    # SSH
    tcp dport 22 accept

    # HTTP and HTTPS
    tcp dport { 80, 443 } accept
  }
}

Caution

Never use flush ruleset in your nftables configuration when Docker is running. This command removes all tables -- including Docker's docker-bridges tables -- and breaks container networking instantly. Always flush only the specific tables you own.

Approach 2: systemd Unit That Runs After Docker

Docker recreates its nftables rules every time the daemon starts. If your rules depend on Docker's tables already existing (for example, if you insert rules into Docker's own chains), you need a systemd service that runs after Docker:

/etc/systemd/system/docker-firewall.service

[Unit]
Description=Custom nftables rules for Docker
After=docker.service
Requires=docker.service

[Service]
Type=oneshot
ExecStart=/usr/local/bin/docker-firewall-rules.sh
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

The companion script waits for Docker's tables to appear before applying rules:

/usr/local/bin/docker-firewall-rules.sh

#!/bin/bash
set -e

# Wait for Docker's nftables tables to exist
max_attempts=10
attempt=0

while ! nft list table ip docker-bridges >/dev/null 2>&1; do
  attempt=$(( attempt + 1 ))
  if [ "$attempt" -ge "$max_attempts" ]; then
    echo "Docker nftables tables not found after $max_attempts attempts"
    exit 1
  fi
  sleep 2
done

# Apply custom rules
nft -f /etc/nftables.d/docker-custom.conf

Enable the service:

$ sudo systemctl enable docker-firewall.service

Handling nftables Service Restarts

A common operational headache: restarting the nftables systemd service flushes the entire ruleset and reloads from /etc/nftables.conf. If Docker's tables are wiped during this process, container networking breaks until Docker is restarted and recreates its rules.

The solution is to always restart Docker after reloading nftables:

safe nftables reload sequence

# Reload nftables configuration
$ sudo systemctl restart nftables

# Verify Docker's tables still exist
$ sudo nft list tables | grep docker-bridges

# If Docker's tables are gone, restart Docker
$ sudo systemctl restart docker

# Verify Docker recreated its tables
$ sudo nft list tables

You can automate this with a systemd override on the nftables service that triggers a Docker restart after nftables reloads:

/etc/systemd/system/nftables.service.d/restart-docker.conf

[Service]
ExecStartPost=/bin/systemctl restart docker

Complete Production Configuration

Here is a full working nftables configuration for a Docker host that serves web traffic through containers behind a reverse proxy, with SSH access restricted to a management subnet. The configuration assumes Docker's nftables backend is enabled and running.

/etc/nftables.d/docker-host.conf

#!/usr/sbin/nft -f

# Ensure table exists before flushing (safe on first boot)
table inet host-firewall {}

# Only flush our own table -- never flush ruleset
flush table inet host-firewall

table inet host-firewall {

  # Named set for management IPs
  set mgmt_nets {
    type ipv4_addr
    flags interval
    elements = { 10.0.0.0/8, 203.0.113.0/24 }
  }

  # Named set for blocked container ports (external access)
  set blocked_container_ports {
    type inet_service
    elements = { 5432, 6379, 27017, 9200 }
  }

  # Input chain: traffic destined for the host itself
  chain input {
    type filter hook input priority filter; policy drop;

    # Loopback always allowed
    iif lo accept

    # Established and related connections
    ct state established,related accept
    ct state invalid drop

    # ICMP for diagnostics
    ip protocol icmp accept
    ip6 nexthdr icmpv6 accept

    # SSH only from management networks
    tcp dport 22 ip saddr @mgmt_nets accept

    # HTTP and HTTPS (reverse proxy on the host)
    tcp dport { 80, 443 } accept
  }

  # Forward chain: runs before Docker's chains at priority -100
  chain forward_filter {
    type filter hook forward priority -200; policy accept;

    # Allow established and related
    ct state established,related accept
    ct state invalid drop

    # Block external access to database/cache ports in containers
    # Use ct original proto-dst to match published host ports before DNAT
    iifname "eth0" ip daddr 172.17.0.0/12 ct original proto-dst @blocked_container_ports drop
  }

  # Output chain: allow all outbound
  chain output {
    type filter hook output priority filter; policy accept;
  }
}

This configuration uses named sets for both management IPs and blocked ports. Named sets are one of nftables' strongest features -- you can update them dynamically without reloading the entire ruleset:

dynamic set updates

# Add a new management IP without reloading
$ sudo nft add element inet host-firewall mgmt_nets { 198.51.100.0/25 }

# Block an additional container port
$ sudo nft add element inet host-firewall blocked_container_ports { 8888 }

# Remove a port from the blocked set
$ sudo nft delete element inet host-firewall blocked_container_ports { 9200 }

Using a Reverse Proxy Instead of Direct Port Publishing

A cleaner alternative to managing nftables rules around Docker's published ports is to avoid publishing ports externally altogether. Instead, bind container ports to 127.0.0.1 and place a reverse proxy (Nginx, Caddy, or Traefik) on the host to handle external traffic.

In your docker-compose.yml, bind ports to localhost only:

docker-compose.yml

services:
  webapp:
    image: myapp:latest
    ports:
      - "127.0.0.1:3000:3000"
    restart: unless-stopped

Then configure Nginx on the host to proxy external traffic:

/etc/nginx/sites-available/myapp

server {
    listen 80;
    listen [::]:80;
    server_name myapp.example.com;

    location / {
        proxy_pass http://127.0.0.1:3000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

This approach sidesteps the entire chain priority problem. The container's port is never exposed to the network, so Docker does not create forwarding rules for it. Your nftables input chain controls access to the reverse proxy on ports 80 and 443, and you manage TLS termination, rate limiting, and access control at the proxy layer.

Troubleshooting

Containers Cannot Reach the Internet

Check that IP forwarding is enabled and that NAT masquerading rules exist in Docker's tables:

troubleshooting outbound connectivity

# Verify forwarding is active
$ sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 1

# Check Docker's NAT rules exist
$ sudo nft list table ip docker-bridges | grep masquerade

# Check for leftover iptables FORWARD DROP policy
$ sudo iptables -L FORWARD --line-numbers

If you previously ran Docker with the iptables backend, it may have set a DROP policy on the iptables FORWARD chain. That policy persists even after switching to nftables and will drop packets that Docker's nftables rules have accepted. This happens because the Linux kernel evaluates both the iptables and nftables rule sets for the same packet -- they are independent views of the same Netfilter hook. A packet that Docker accepts in its nftables chains can still be dropped by a residual iptables DROP policy. Reset the iptables FORWARD policy to ACCEPT and verify no residual iptables rules remain:

clear residual iptables state after backend switch

# Reset DROP policy to ACCEPT on the FORWARD chain
$ sudo iptables -P FORWARD ACCEPT

# Flush any residual Docker iptables chains
$ sudo iptables -F DOCKER 2>/dev/null || true
$ sudo iptables -F DOCKER-USER 2>/dev/null || true
$ sudo iptables -t nat -F DOCKER 2>/dev/null || true

# Confirm the FORWARD chain policy is now ACCEPT
$ sudo iptables -L FORWARD --line-numbers

Published Ports Not Accessible

Verify that Docker's DNAT rules were created for the published port:

$ sudo nft list table ip docker-bridges | grep dnat

If DNAT rules are missing, restart the Docker daemon. If they exist but the port is still unreachable, check whether your custom nftables chains are dropping the traffic at a higher priority.

Rules Disappear After Docker Restart

Docker recreates its nftables tables every time the daemon starts. Any rules you manually inserted into Docker's own tables will be lost. This is expected behavior -- Docker owns its tables. Store your rules in a separate table and use the systemd service approach described above to reapply them after Docker starts.

Using the nftables Trace Facility

For real-time debugging of which chains and rules a packet actually hits, nftables provides a built-in trace mechanism. This is far more useful than guessing at priority interactions because it shows you exactly which chains evaluate the packet and in what order across your entire ruleset.

Enable tracing on a specific chain with a temporary rule, then monitor the output:

nftables trace walkthrough

# Step 1: Enable the nfnetlink_log module if not already loaded
$ sudo modprobe nfnetlink_log

# Step 2: Insert a trace rule at the top of your custom chain
$ sudo nft insert rule inet host-firewall forward_filter \
  meta nftrace set 1

# Step 3: Start monitoring trace output in one terminal
$ sudo nft monitor trace

# Step 4: Generate test traffic in another terminal
# e.g. curl from an external host to a container port

# Step 5: Remove the trace rule when done
$ sudo nft delete rule inet host-firewall forward_filter handle <handle>
# Get the handle number from: sudo nft -a list chain inet host-firewall forward_filter

The trace output shows each packet's path through every base chain on every hook it traverses, with the verdict at each step. This is the definitive way to confirm whether Docker's accept verdict in one chain still allows your chain to drop the packet, or to identify which chain is issuing a final drop you did not expect.

Pro Tip

Trace rules match every packet, so remove them as soon as you are done debugging. On a busy host, leaving a trace rule active will flood your terminal and have a measurable performance impact. The handle number is the reliable identifier for deletion -- get it with sudo nft -a list chain <family> <table> <chain>.

What Happens at the Kernel Level

Understanding why accept is not final and drop is requires looking one layer below nftables, at the Netfilter hook mechanism itself. Each nftables base chain registers a hook function with one of the kernel's Netfilter hooks (prerouting, input, forward, output, postrouting). When a packet reaches a hook, the kernel calls every registered hook function in priority order. Each hook function returns a verdict: NF_ACCEPT or NF_DROP.

When a hook function returns NF_ACCEPT, the kernel moves to the next registered hook function at that same hook point. The packet continues its journey only if every hook function at that hook returns NF_ACCEPT. When any hook function returns NF_DROP, the kernel immediately discards the packet and does not call any remaining hook functions. This is not an nftables design decision -- it is how Netfilter itself works in the kernel, and it explains why multiple nftables base chains attached to the same hook behave the way they do.

In practical terms, when Docker creates a base chain at the forward hook with priority -100, and you create your own base chain at the same hook with priority -200, the kernel sees two hook functions registered at the forward hook. Your chain (priority -200) runs first. If your chain accepts the packet, Docker's chain (priority -100) still runs. If Docker's chain also accepts, the packet is truly accepted. But if either chain drops, the packet is gone -- no recovery possible.

This is also why connection tracking hooks matter for rule design. The connection tracking system registers its own hook functions at priority -200 in prerouting and at priority INT_MAX in postrouting (for conntrack confirmation). DNAT transformations happen at the conntrack prerouting hook. By the time any forward-hook chain sees the packet, the destination address and port have already been rewritten. That is why ct original proto-dst is necessary to match the original port -- it queries the connection tracking entry, which recorded the packet's state before DNAT applied.

Note

The connection tracking priority values (-200 for prerouting, INT_MAX for postrouting confirmation) are defined in the kernel source as NF_IP_PRI_CONNTRACK and NF_IP_PRI_CONNTRACK_CONFIRM. These are not configurable. If you set a forward chain priority below -200, your chain runs before connection tracking has processed the packet at the forward hook -- but DNAT has already completed at prerouting, so ct original proto-dst still works correctly in forward chains at any priority.

The Complete Kernel Priority Map

One of the hardest details to find when configuring nftables alongside Docker is the full set of priority values that the kernel itself uses for its internal operations at each hook. These values are defined in include/uapi/linux/netfilter_ipv4.h and determine where your chains and Docker's chains sit relative to connection tracking, NAT, SELinux, and packet defragmentation. Knowing these values is the difference between guessing at chain priority and placing your chains with precision.

kernel IPv4 hook priorities (netfilter_ipv4.h)

// These are the kernel-defined priority values for IPv4 Netfilter hooks.
// Your nftables chains and Docker's chains are interleaved with these.

enum nf_ip_hook_priorities {
  NF_IP_PRI_FIRST            = INT_MIN,    // Absolute first evaluation
  NF_IP_PRI_RAW_BEFORE_DEFRAG = -450,     // Before defragmentation
  NF_IP_PRI_CONNTRACK_DEFRAG = -400,      // Defragmentation for conntrack
  NF_IP_PRI_RAW              = -300,      // nftables "raw" priority
  NF_IP_PRI_SELINUX_FIRST    = -225,      // SELinux first hook
  NF_IP_PRI_CONNTRACK        = -200,      // Connection tracking (DNAT here)
  NF_IP_PRI_MANGLE           = -150,      // nftables "mangle" priority
  NF_IP_PRI_NAT_DST          = -100,      // Destination NAT
  NF_IP_PRI_FILTER           = 0,         // nftables "filter" priority
  NF_IP_PRI_SECURITY         = 50,        // Security frameworks
  NF_IP_PRI_NAT_SRC          = 100,       // Source NAT (masquerade)
  NF_IP_PRI_SELINUX_LAST     = 225,       // SELinux last hook
  NF_IP_PRI_CONNTRACK_HELPER = INT_MAX-2,  // Conntrack helpers (ALGs)
  NF_IP_PRI_NAT_SEQ_ADJUST   = INT_MAX-1, // NAT sequence adjustment
  NF_IP_PRI_CONNTRACK_CONFIRM = INT_MAX,   // Conntrack confirmation
};

Several practical implications follow from this map. Docker's forward chain at priority -100 sits exactly at NF_IP_PRI_NAT_DST, which is after connection tracking (-200) has already processed the packet and after any mangle-priority chains (-150) have run. If you place your chain at priority -200, you are sharing the priority level with conntrack itself. While this works in the forward hook (because conntrack's DNAT work happens at prerouting, not forward), it is worth understanding that at the prerouting hook, a chain at priority -200 would execute alongside connection tracking.

For NAT type chains, nftables enforces a hard lower bound: nat chains must use a priority greater than -200. This restriction exists because connection tracking hooks at that priority, and NAT depends on conntrack being active. If you need to add custom DNAT or SNAT rules alongside Docker's, your nat chains must respect this boundary.

The nftables user-space tools also allow you to specify priority using keyword names instead of raw numbers (since nftables 0.9.6). So priority filter means 0, priority raw means -300, and priority mangle means -150. You can also use offsets: priority filter - 1 means -1, and priority mangle + 10 means -140. Docker's documentation uses this offset syntax when describing where to place firewall mark chains for the --bridge-accept-fwmark option.

"if all hook functions of this hook return NF_ACCEPT, then the packet finally continues."
-- Thermalcircle, Nftables - Packet flow and Netfilter hooks in detail. The analysis confirms that NF_DROP causes immediate deletion with no further hook functions or network stack traversal, while NF_ACCEPT only advances the packet to the next registered hook function at that same hook point.

Why Docker Uses the `ip` Family Instead of `inet`

Docker creates its tables in the ip and ip6 address families rather than the unified inet family. This is a deliberate choice. The inet family, while convenient for administrators writing dual-stack rules, does not support all the same features as the protocol-specific families. In particular, some conntrack expressions and NAT behaviors differ between the families. By using separate ip and ip6 tables, Docker ensures maximum compatibility with all kernel versions that support nftables and avoids subtle behavioral differences in NAT rule evaluation that can arise with the inet family.

When you write your own custom chains, you can use the inet family to handle both IPv4 and IPv6 in a single table. The kernel evaluates inet chains for both protocol versions, and they coexist with Docker's protocol-specific tables at the same hooks. Just be aware that if you need to match IPv4-specific expressions (like ip saddr) in an inet chain, those expressions will only match IPv4 packets and will be silently skipped for IPv6 packets. This is usually the desired behavior, but it means you cannot write a single rule that matches both ip saddr and ip6 saddr -- you need separate rules or a verdict map.

What Comes Next: Docker's nftables Roadmap

Docker's nftables backend is explicitly transitional. The Docker engineering team has stated publicly that nftables will become the default firewall backend in a future release and that iptables support will be deprecated. Several specific improvements are planned before that transition:

Swarm overlay network support. The overlay network rules used by Docker Swarm have not yet been migrated from iptables. Until this lands, nftables cannot be enabled on Swarm nodes. This is the primary blocker for production adoption in multi-host deployments.
Efficiency improvements. Docker's current nftables rules do not yet exploit nftables features like port sets to their full extent. Future releases will consolidate rules to reduce the number of chains and entries created per bridge network.
Stabilization of the rule structure. The internal structure of Docker's docker-bridges tables is explicitly unstable between releases during the experimental period. Once the backend is promoted out of experimental status, the table structure will be documented and stabilized.

Warning

Because the table structure is subject to change during the experimental period, avoid writing scripts that parse or depend on the specific chain names or rule positions inside Docker's docker-bridges tables. Write your own custom rules in your own tables and use priority and interface matching to interact with Docker's networking, not internal chain structure.

Security Fixes in Docker Engine 29

While the nftables backend is the headline networking change in Docker Engine 29, it is worth being aware of the security fixes that shipped in the 29.x patch series, as they affect hosts running this version. Docker Engine 29.3.1, released on March 25, 2026, addressed several CVEs. Notable among them: CVE-2026-34040 (CVSS 8.8) fixed an authorization bypass in AuthZ plugins where API request bodies exceeding 1 MB were silently dropped before reaching the plugin, allowing specially crafted requests to bypass security policies -- this was an incomplete fix for the earlier CVE-2024-41110; CVE-2026-33997 fixed privilege escalation via partial validation bypass in docker plugin install; CVE-2026-33748 and CVE-2026-33747 fixed BuildKit vulnerabilities involving path traversal in Git URL fragments and untrusted frontends writing files outside the state directory. These are separate from the nftables backend itself, but administrators enabling nftables on a production host should ensure they are running Docker Engine 29.3.1 or later rather than pinning to 29.0.

The CVE-2026-34040 vulnerability is particularly worth understanding in detail because of its simplicity and its implications for automated systems. The flaw resides in the AuthZRequest function in pkg/authorization/authz.go. When the Docker daemon receives an API request, it attempts to buffer the request body using a drainBody function before forwarding it to authorization plugins for inspection. In vulnerable versions, if the request body exceeds an internal 1 MB threshold, the daemon silently truncates the body -- sending an empty payload to the AuthZ plugin while retaining and processing the full original request. The plugin, seeing no body to inspect, defaults to allowing the request. The daemon then creates whatever container the attacker specified, including privileged containers with host filesystem mounts.

The attack requires only a single HTTP request: the attacker appends a dummy padding field to a normal container-creation payload to push it past the 1 MB limit. No exploit code, no special tools, no elevated privileges beyond basic Docker API access are needed. Cyera Research, which discovered the vulnerability, demonstrated that AI coding agents running inside Docker-based sandboxes can trigger this bypass without explicit instruction -- an agent that encounters a permissions error when trying to access host resources may independently construct a padded API request as a workaround, since the bypass is just standard HTTP with extra padding.

"an attacker could make the Docker daemon forward the request... without the body."
-- Docker Engine maintainers, Docker Engine v29 Release Notes, describing CVE-2026-34040. The fix in 29.3.1 implements a fail-closed mechanism that increases the body size threshold to 4 MiB and explicitly rejects oversized requests rather than silently truncating them.

The patched version (29.3.1, commit e89edb19ad7d) changed the buffer inspection logic to use a Peek operation that checks whether the payload exceeds 4 MiB. If it does, the request is rejected outright rather than forwarded with an empty body. This is the fail-closed approach: if the body cannot be fully buffered and inspected, the request is denied. If you cannot patch immediately, the recommended mitigations are to restrict Docker API access to trusted users only, avoid relying on AuthZ plugins that inspect request bodies for security decisions, run Docker in rootless mode (where even a privileged container's root maps to an unprivileged host UID), or add a reverse proxy with a 512 KB body size limit in front of the Docker API socket.

Docker Desktop users on Windows and macOS should additionally be aware of CVE-2025-9074 (CVSS 9.3), patched in Docker Desktop 4.44.3 in August 2025. That vulnerability allowed any container to access the Docker Engine API at an internal address without authentication, enabling full host compromise on Windows. That vulnerability does not affect Docker Engine running natively on Linux hosts, which is the target environment for the nftables backend.

For tracking the current status, watch the Moby project issue tracker and the Docker Engine release notes. The Docker documentation page Docker with nftables is the authoritative reference and is updated with each Engine release.

Wrapping Up

Docker's native nftables backend removes the iptables compatibility layer that has been a source of operational friction on modern Linux distributions. The configuration is straightforward -- a single key in daemon.json -- but the real complexity lies in understanding how Docker's auto-generated tables and chains interact with your custom firewall rules.

The essential principles to keep in mind: Docker creates and owns the docker-bridges tables, so never modify them directly. Unlike iptables, an accept verdict in nftables is not final -- your custom chains can still drop packets that Docker's chains have accepted. A drop verdict, however, is final; if you need to override a Docker drop, use --bridge-accept-fwmark. At the kernel level, this behavior traces directly to how Netfilter hook functions return NF_ACCEPT (continue to the next hook function) versus NF_DROP (discard immediately with no further evaluation). When writing forwarding rules around published ports, match on ct original proto-dst rather than the destination port directly, because DNAT runs in prerouting before your forward chain sees the packet -- the connection tracking entry preserves the original port for your rules to query. Never flush the entire ruleset when Docker is running -- flush only your own tables. Always restart Docker after reloading the nftables service to ensure Docker's rules are recreated. If you are running firewalld, you do not need to disable it; the nftables backend coexists with firewalld's zones and policies. And ensure you are running Docker Engine 29.3.1 or later to pick up the security fixes for CVE-2026-34040 and related vulnerabilities.

This backend is explicitly on its way to becoming the default. Docker has stated that nftables will replace iptables as the standard and that iptables support will eventually be deprecated. Getting familiar with it now, while it is still experimental, means you will be ahead of the transition rather than scrambling to migrate when iptables support is removed.

How to Configure nftables with Docker

Step 1: Verify prerequisites and enable IP forwarding

Confirm that Docker Engine 29 or later is installed, that the nftables user-space tools are present, and that IPv4 forwarding is enabled in sysctl. Unlike the iptables backend, Docker will not enable IP forwarding automatically when using nftables -- it will report an error if forwarding is needed but not already active.

Step 2: Enable the nftables firewall backend in daemon.json

Open or create /etc/docker/daemon.json and add the firewall-backend key set to nftables. Restart the Docker daemon with systemctl restart docker, then verify the new tables exist by running nft list tables and checking for ip docker-bridges and ip6 docker-bridges.

Step 3: Migrate DOCKER-USER rules and add custom chains

The nftables backend has no DOCKER-USER chain. Instead, create a separate nftables table with base chains that hook into the same points as Docker's chains. Set your chain priority sufficiently low (for example, -200) to ensure your rules evaluate before Docker's chains. Verify Docker's actual chain priorities with: sudo nft list chains ip docker-bridges. Place your custom table in /etc/nftables.conf or a file loaded by a systemd unit that runs after docker.service.

Step 4: Make custom rules persistent and handle service restarts

Never use flush ruleset in your nftables configuration when Docker is running -- this removes Docker's tables and breaks container networking. Instead, flush only your own tables by name. To handle nftables service reloads, create a systemd override that restarts Docker after nftables reloads, or use a separate docker-firewall.service unit with After=docker.service to reapply rules each time Docker starts.

Frequently Asked Questions

Does Docker Compose require changes when switching to the nftables backend? +

No. The firewall backend is configured at the daemon level in /etc/docker/daemon.json, not in individual container specifications. Existing docker-compose.yml files work identically whether Docker uses iptables or nftables.

Why do my nftables drop rules not block traffic to Docker containers? +

This is usually a chain priority misunderstanding combined with a misread of how nftables verdicts work. An accept verdict in Docker's chains is not final -- the packet still traverses your chains, so your drop rules can take effect. However, a drop verdict in Docker's chains is final and cannot be overridden by another chain. If your custom rules are not firing, check whether Docker's chains contain a drop for your traffic (which you cannot override with a plain drop rule) or confirm your chain hook and interface matcher are correct. If you need to override a Docker drop, use the --bridge-accept-fwmark option instead.

Can Docker use nftables in Swarm mode? +

Not yet. As of Docker Engine 29, the overlay network rules used by Swarm have not been migrated from iptables. The nftables backend cannot be enabled when the Docker daemon is running in Swarm mode. Single-host bridge networks are fully supported. Swarm support is on Docker's stated roadmap for a future release.

Does an accept verdict in Docker's nftables chains prevent my rules from running? +

No. Unlike iptables, an accept verdict in nftables terminates processing only within that specific base chain. The packet still traverses all other base chains attached to the same hook. Your custom chain's drop rules will still evaluate even after Docker's chain has accepted the packet. What cannot be overridden is a drop verdict -- if Docker's chain drops a packet, that verdict is final. To override a Docker drop, use the --bridge-accept-fwmark daemon option.

After switching from iptables to nftables, why are containers still losing connectivity? +

When Docker ran with the iptables backend, it set a DROP policy on the iptables FORWARD chain. That policy persists after switching to nftables because the Linux kernel evaluates both rulesets independently. Packets that Docker's nftables rules accept can still be dropped by the residual iptables DROP policy. Reset the FORWARD policy to ACCEPT with: sudo iptables -P FORWARD ACCEPT. Also flush any residual DOCKER and DOCKER-USER chains left over from the previous backend.

Can I use Docker's nftables backend with firewalld? +

Yes. When Docker's nftables backend is enabled on a host running firewalld, Docker still sets up firewalld zones and policies for its interfaces -- it creates a zone called docker with an ACCEPT target and inserts all bridge interfaces into it, and creates a docker-forwarding policy. However, Docker creates its nftables rules directly rather than through firewalld's deprecated direct interface. The key point: Docker's nftables tables and firewalld's nftables rules coexist because nftables uses separate tables with independent base chains. You do not need to disable firewalld to use Docker's nftables backend.

What does the --ip-forward=false daemon option do with the nftables backend? +

With the nftables backend, Docker will not enable IP forwarding itself -- unlike the iptables backend. If IP forwarding is disabled, Docker reports an error at daemon startup. The --ip-forward=false option (or "ip-forward": false in daemon.json) disables this check, allowing Docker to start and create networks even when it determines forwarding is disabled. This is useful for testing or for hosts where forwarding is managed by a separate configuration system. In production, IP forwarding should be explicitly enabled before the Docker daemon starts.

Why does ct original proto-dst work in forward chains if DNAT runs at prerouting? +

DNAT is applied by the connection tracking system at the prerouting Netfilter hook, which runs before any forward hook chain. By the time your forward chain sees the packet, the destination address has already been rewritten to the container's internal IP and port. However, the connection tracking subsystem records the original destination in the conntrack entry. The ct original proto-dst expression reads from that conntrack entry rather than from the live packet headers, so it returns the pre-DNAT port regardless of when your forward chain evaluates.

^ back to top

Configuring nftables with Docker

Why the Shift from iptables to nftables

Prerequisites

Enabling the nftables Firewall Backend

How Docker's nftables Tables Work

Understanding Chain Priority

Migrating from the DOCKER-USER Chain

Using Firewall Marks to Override Docker's Drop Rules

Matching Traffic on Its Original Port Before DNAT

Making Custom Rules Persistent

Approach 1: Separate nftables Configuration File

Approach 2: systemd Unit That Runs After Docker

Handling nftables Service Restarts

Complete Production Configuration

Using a Reverse Proxy Instead of Direct Port Publishing

Troubleshooting

Containers Cannot Reach the Internet

Published Ports Not Accessible

Rules Disappear After Docker Restart

Using the nftables Trace Facility

What Happens at the Kernel Level

The Complete Kernel Priority Map

Why Docker Uses the ip Family Instead of inet

What Comes Next: Docker's nftables Roadmap

Security Fixes in Docker Engine 29

Wrapping Up

How to Configure nftables with Docker

Step 1: Verify prerequisites and enable IP forwarding

Step 2: Enable the nftables firewall backend in daemon.json

Step 3: Migrate DOCKER-USER rules and add custom chains

Step 4: Make custom rules persistent and handle service restarts

Frequently Asked Questions

Why Docker Uses the `ip` Family Instead of `inet`