You deploy a container, publish a port with -p 8080:80, then add a drop rule in your nftables configuration to block external access to port 8080. You reload the ruleset, run a port scan from another machine, and the port is still wide open. This is one of the more frustrating experiences in Linux administration, and it happens to experienced sysadmins regularly. Docker manipulates firewall rules in ways that circumvent the rules you write, and understanding why requires following a packet through the entire Docker daemon's networking architecture and the Netfilter pipeline.
The instinct of every experienced administrator is the same: if I drop a packet in my firewall, it is dropped. That mental model is correct for traditional services. It is wrong for Docker. The reason it is wrong has nothing to do with your rules being misconfigured -- the rules are syntactically correct, logically sound, and would work perfectly if the packets still looked the way you think they look. The problem is that Docker has already rewritten the packet before your rules evaluate it, and the packet your rule is looking for no longer exists. Every fix in this guide comes down to one insight: you are not filtering the packet you think you are filtering.
The Problem: Docker Publishes Ports Behind Your Back
When you run docker run -p 8080:80 nginx, Docker does not simply open port 8080 on the host. It inserts rules into the nat table's PREROUTING chain that use Destination Network Address Translation (DNAT) to rewrite incoming packets. A packet arriving at the host on port 8080 has its destination address rewritten to the container's internal IP and port -- something like 172.17.0.2:80 -- before the packet ever reaches the filter table's INPUT or FORWARD chains.
This is the root of the bypass. Your nftables rules in the input hook that drop traffic to port 8080 never fire, because the packet's destination has already been rewritten to a container address. By the time the packet reaches the forward hook, its destination port is 80 on 172.17.0.2, not 8080 on your host. The packet no longer matches the rule you wrote.
This behavior is not a bug. Docker's NAT-based port publishing is working as designed. The problem is that administrators often assume their filter rules apply to the original packet headers, when in reality DNAT rewrites those headers before filtering occurs.
How Netfilter Processes Packets
Why This Is a Security Problem, Not Just a Configuration Annoyance
Docker's firewall bypass is not an abstract inconvenience. Every port Docker publishes without effective firewall filtering is an exposed attack surface. In April 2026, CVE-2026-34040 (CVSS 8.8) demonstrated how an attacker with Docker API access could bypass authorization plugins entirely using an oversized HTTP request body, creating privileged containers with full host filesystem access. That vulnerability's disclosure from Cyera Research Labs reinforced a pattern: Docker's authorization mechanisms have been the target of repeated bypasses since CVE-2024-41110 (CVSS 10.0), and each fix has left edge cases exploitable.
Any agent that can read Docker API documentation can construct it.
-- Cyera Research Labs, CVE-2026-34040 Disclosure
The MITRE ATT&CK framework for Containers (technique T1059.013, Container CLI/API) documents how adversaries exploit exposed Docker APIs for lateral movement, cryptomining deployment, and container escape. An accidentally exposed database port -- the exact scenario that Docker's firewall bypass enables -- gives attackers a direct path to credential harvesting, data exfiltration, or pivoting deeper into the infrastructure. The Doki malware campaign demonstrated this pattern in production: attackers used exposed Docker APIs to pull legitimate images, then executed malicious payloads from within containers that had been spun up without the host administrator's knowledge.
If your Docker host exposes ports 2375 (unencrypted Docker API) or 2376 (TLS Docker API) to any untrusted network, you have a critical vulnerability regardless of your nftables configuration. Attackers routinely scan for these ports and can create privileged containers, mount the host filesystem, and achieve full host compromise in seconds. Always bind the Docker daemon to a Unix socket with restricted file permissions (root:docker, 0660) and never expose the TCP API without mutual TLS authentication and network segmentation.
Understanding the Netfilter pipeline is therefore not just a sysadmin skill -- it is a security control. Every rule you write to restrict container traffic is a compensating control against the class of attacks that begin with unauthorized access to a published port.
To understand why Docker traffic escapes your rules, you need to know the order in which Netfilter hooks execute. When a packet arrives at a Linux host, it passes through hooks in this sequence: prerouting, then a routing decision, then either input (if the packet is destined for the host) or forward (if the packet is being routed elsewhere), and finally postrouting.
Docker's DNAT rules live in the prerouting hook. The routing decision happens after prerouting. Because DNAT changes the destination address to a container IP on an internal bridge interface, the kernel's routing decision sends the packet down the forward path instead of the input path. Your host-level firewall rules in the input hook never see the packet at all.
# Incoming packet destined for host:8080 # # 1. PREROUTING hook (nat table) # Docker's DNAT rule rewrites destination: # host:8080 --> 172.17.0.2:80 # # 2. Routing decision # Destination is now 172.17.0.2 (docker0 bridge) # Packet is routed to FORWARD, not INPUT # # 3. FORWARD hook (filter table) # Docker's ACCEPT rules allow the packet through # Your drop rule for port 8080 does not match # (destination is now port 80 on 172.17.0.2) # # 4. POSTROUTING hook (nat table) # Masquerading for return traffic # # Result: packet reaches the container despite your drop rule
This is the fundamental disconnect. Administrators write rules that match on the original destination port, but the packet headers have already been rewritten by the time those rules are evaluated.
To be truly accepted, a packet must pass through every chain without being dropped.
-- Adapted from dzx.fr nftables analysis
The iptables Backend: DOCKER-USER Chain
With Docker's default iptables backend, Docker creates several custom chains in the filter table's FORWARD path. The packet traversal order in the FORWARD chain is: DOCKER-USER, then DOCKER-ISOLATION-STAGE-1, then connection tracking rules, then the DOCKER chain itself.
The DOCKER-USER chain is specifically designed as an insertion point for administrator rules. Any rules you place here execute before Docker's own accept rules in the DOCKER chain. This is where you can drop or restrict traffic to container-published ports.
# Always insert established/related rule first # (with -I, each insertion goes to position 1, so insert in reverse order) $ sudo iptables -I DOCKER-USER -p tcp -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT # Block external access to a specific container's internal port # Note: match on the container's internal port, not the host port $ sudo iptables -I DOCKER-USER 2 -i eth0 -p tcp --dport 5432 -j DROP # Allow only 192.0.2.0/24 to reach published container ports $ sudo iptables -I DOCKER-USER 3 -i eth0 ! -s 192.0.2.0/24 -j DROP
There is a critical detail here. Because DNAT has already occurred by the time packets reach DOCKER-USER, the destination port in the packet header is the container's internal port, not the host's published port. If you mapped -p 8080:5432, you need to match on port 5432, not 8080. To filter based on the original host port, use conntrack's original destination match: -m conntrack --ctorigdstport 8080.
When packets arrive to the DOCKER-USER chain, they have already passed through DNAT.
-- Docker Documentation, Docker with iptables
On modern Linux distributions, the iptables command often points to iptables-nft, which translates iptables syntax into nftables rules behind the scenes. Even when you think you are writing iptables rules, the kernel may be running nftables. Check with iptables --version -- if the output includes (nf_tables), you are using the iptables-nft compatibility layer.
The Native nftables Backend (Docker 29+)
Docker Engine 29.0.0 introduced experimental native nftables support, enabled with "firewall-backend": "nftables" in /etc/docker/daemon.json. With this backend, Docker creates its own nftables tables directly -- ip docker-bridges and ip6 docker-bridges -- instead of going through the iptables compatibility layer.
This is where the firewall bypass problem gets worse. With native nftables, there is no DOCKER-USER chain. The Docker documentation states that rules should be added "in separate tables, with base chains" using appropriate priority values. Administrators must build their own filtering infrastructure from scratch.
Do not modify Docker's tables directly as the modifications are likely to be lost.
-- Docker Documentation, Firewall with nftables
There is an address family mismatch trap that catches experienced administrators. Docker's native nftables tables use the ip and ip6 families (separate tables for each protocol). If you create your custom filtering chain in the inet family (which handles both IPv4 and IPv6), that works correctly -- inet chains see packets from both protocols. However, if you mistakenly create your chain in the ip family, IPv6 traffic to your containers bypasses your rules entirely. Always use inet for your custom chains unless you have a specific reason not to.
You will find guides that recommend creating custom Docker filtering chains in the ip family to match Docker's own table family. The reasoning is that since Docker creates ip docker-bridges and ip6 docker-bridges as separate tables, your filter chains should use the same families. This reasoning is understandable but unnecessary and can create gaps. Chains in the inet family see packets from both IPv4 and IPv6 at the same hook point, so a single inet chain covers what would otherwise require two separate ip and ip6 chains. The inet family is the nftables project's recommended approach for dual-stack filtering, and all code examples in this guide use it for that reason. The only scenario where ip-family chains are required is when you need to use family-specific features that are not available in inet, which does not apply to Docker traffic filtering.
Another behavioral difference that bites administrators during migration: with the iptables backend, Docker enables IP forwarding (net.ipv4.ip_forward) on the host automatically and sets the iptables FORWARD chain policy to DROP. With the nftables backend, Docker does not enable IP forwarding itself -- it reports an error if forwarding is needed but not enabled. And critically, Docker's nftables backend does not set a default drop policy on the forward hook. This means that if you are migrating from iptables to nftables and you relied on Docker's implicit FORWARD DROP policy as a safety net, that net is gone. You must create your own default-drop chain explicitly.
There is also a daemon restart subtlety documented in the Docker Engine issue tracker. With iptables, the FORWARD policy persists across daemon restarts because iptables uses shared chains. With nftables, Docker's filter-FORWARD policy belongs to Docker's own table, which gets completely reconstructed on restart. After restart, the policy resets to accept because Docker sees that IP forwarding is already enabled (from the previous run) and skips setting a drop policy. The related "ip-forward-no-drop": true daemon option serves a different purpose: it tells Docker not to set the FORWARD policy to DROP even on first start, which is useful if you manage your own forwarding policies. Understanding both the restart behavior and this option is essential when migrating between backends.
The filter-FORWARD policy belongs to Docker's own table, which gets completely reset on restart.
-- Docker Engine Issue #50566
# Docker's native nftables backend creates two separate tables: # ip docker-bridges (IPv4) and ip6 docker-bridges (IPv6) # The structure below is a simplified representation. table ip docker-bridges { chain forward { type filter hook forward priority -100; policy accept; ct state established,related accept iifname "docker0" accept oifname "docker0" accept } }
That priority -100 is the critical detail. Standard nftables filter chains default to priority 0. Because nftables evaluates chains at the same hook in order of ascending numeric priority, Docker's chain at -100 runs before your chain at 0. If Docker's chain accepts a packet, your drop rule at priority 0 never executes.
A common misconception about nftables is that when one chain accepts a packet, other chains at the same hook are skipped. This is not how it works. An accept verdict in one chain allows the packet to continue to other chains at the same hook -- so a drop rule at priority 0 can override an accept at priority -100. A drop verdict in any chain is immediately terminal and stops all further processing. The real challenge with Docker is not chain priority ordering -- it is that DNAT has already rewritten the destination port by the time any forward-hook chain evaluates the packet. Your drop rule for port 8080 does not match because the packet header now says port 80. This is why conntrack matching (ct original proto-dst) or a lower-priority chain is needed.
Understanding nftables Chain Priority
nftables chain priority is an integer value assigned to base chains. Chains at the same hook are processed in order from lowest numeric value to highest. Some well-known priority names and their numeric equivalents for the ip/inet families are: raw (-300), mangle (-150), dstnat (-100), filter (0), security (50), srcnat (100).
Docker's native nftables backend uses priority -100 for its forward chain -- the same value as the dstnat priority keyword. Your standard filter chain at priority 0 processes packets after Docker's chain has already had its say. This is why a simple drop rule in a priority 0 chain fails to block Docker traffic.
There is a subtle but important point about how Netfilter processes verdicts across multiple chains at the same hook that many guides get wrong. When a chain issues an accept verdict, the packet is not immediately forwarded -- it continues to traverse all remaining chains at the same hook. Only after passing through every chain without being dropped is the packet considered accepted. A drop verdict, by contrast, is immediately terminal: the packet is destroyed and no further chains see it. This means that in theory, a drop rule at priority 0 can override an accept at priority -100. The practical problem is that Docker's chain has a policy accept statement, which means any packet not explicitly handled by a rule in Docker's chain is accepted by the chain's default policy -- and broad interface-name matches like oifname "docker0" accept catch almost everything headed for containers. The real risk is not the priority ordering itself, but the breadth of Docker's accept rules combined with DNAT rewriting.
You will find many guides, forum posts, and Stack Overflow answers claiming that once Docker accepts a packet at priority -100, a drop rule at priority 0 cannot override it. This is incorrect. The nftables wiki, the nft(8) man page, and the Netfilter kernel source all confirm that an accept verdict ends evaluation of the current base chain only, while a drop verdict is globally terminal across all chains. A drop at priority 0 absolutely can kill a packet that was accepted at priority -100. The confusion likely stems from iptables behavior, where ACCEPT in one chain is final for that table's traversal. We recommend the lower-priority approach (priority -200) in this guide not because priority 0 cannot work, but because it eliminates any dependency on correctly matching post-DNAT port numbers and removes the risk of subtle rule-ordering mistakes that come from trying to filter packets whose headers have already been rewritten.
# Chain evaluation order at the forward hook: # # Priority -200 --> your custom chain (runs first) # Priority -100 --> Docker's forward chain # Priority 0 --> standard filter chain (runs last) # # A DROP verdict at any priority stops the packet immediately. # An ACCEPT verdict allows the packet to continue to the next chain. # # If Docker ACCEPTs at -100 and you DROP at 0: # The packet is accepted by Docker's chain, continues to priority 0, # and IS dropped by your rule -- a drop at 0 CAN override an accept at -100. # # However, if your chain at 0 has "policy accept" (not drop) and no # matching drop rule, the packet passes through unimpeded. # The safest approach: drop at a priority LOWER than Docker's.
The safest strategy is to create your filtering chain with a priority lower (more negative) than Docker's. At priority -200, your chain runs before Docker's chain at -100, and a drop verdict at your chain kills the packet before Docker ever sees it.
docker run -p 9090:3000 myapp and add this nftables rule to block external access:iifname "eth0" tcp dport 9090 drop (in a priority 0 chain).External clients can still reach port 9090. Why?
ct original proto-dst 9090.Fix 1: Create a Higher-Priority Forward Chain
The most straightforward fix is to add a custom nftables table with a base chain on the forward hook at a priority that beats Docker. Here is a complete example:
table inet docker-filter { chain forward_early { type filter hook forward priority -200; policy accept; # Drop invalid packets early ct state invalid drop # Allow established and related connections ct state established,related accept # Block external access to port 3000 on all containers iifname "eth0" tcp dport 3000 drop # Block external access to a specific container IP iifname "eth0" ip daddr 172.17.0.5 tcp dport 5432 drop # Allow only trusted subnet to reach container port 8080 iifname "eth0" tcp dport 8080 ip saddr != 10.0.0.0/8 drop } }
Verify your chain is in place and at the correct priority:
There is one important caveat with this approach: because DNAT has already occurred in the prerouting hook (which runs before forward regardless of priority), the destination port you see in the forward hook is the container's internal port, not the published host port. If your mapping is -p 9090:80, you need to drop on port 80, not 9090, or use conntrack to reference the original destination.
Fix 2: Use Conntrack Original Destination
Conntrack maintains a record of the original destination address and port before DNAT rewrote them. You can match on these values in the forward hook using ct original proto-dst. This lets you write rules that reference the published host port -- the port the external client targeted -- rather than the container's internal port.
table inet docker-filter { chain forward_early { type filter hook forward priority -200; policy accept; # Drop invalid packets early ct state invalid drop # Allow established and related connections ct state established,related accept # Block external access to published port 9090 # (regardless of what internal container port it maps to) iifname "eth0" ip daddr 172.17.0.0/12 ct original proto-dst 9090 drop # Allow only trusted IPs to reach published port 8080 iifname "eth0" ip daddr 172.17.0.0/12 ct original proto-dst 8080 \ ip saddr != 10.0.0.0/8 drop } }
The ct original proto-dst expression is especially useful in environments where container port mappings change frequently, or where the same internal port is published on different host ports across different containers. You always match on the port the outside world sees, not the container's internal wiring.
One detail the Docker documentation mentions but does not explain: conntrack matching can degrade performance under heavy connection load. Each conntrack lookup requires traversing a hash table in kernel memory. The default conntrack table size (net.netfilter.nf_conntrack_max) is typically 65536 entries on systems with 4 GB of RAM, calculated as total_memory / 16384 / sizeof(struct nf_conn). On Docker hosts handling thousands of concurrent connections -- particularly with microservices that make many short-lived HTTP calls between containers -- the table can fill up, causing new connections to be dropped with nf_conntrack: table full, dropping packet in kernel logs. Monitor the current count with cat /proc/sys/net/netfilter/nf_conntrack_count and increase the maximum with sysctl -w net.netfilter.nf_conntrack_max=262144 if you see it approaching capacity. For each doubling of the max, also double net.netfilter.nf_conntrack_buckets to maintain hash table efficiency.
-p 8080:5432 for a PostgreSQL container. You want to block external access using conntrack. Which nftables expression matches correctly?tcp dport sees 5432, not 8080, and the conntrack match is redundant. (d) matches the container's internal port, which is what tcp dport already does without conntrack. (c) is correct: ct original proto-dst 8080 matches the pre-DNAT destination port that the external client targeted.When building a default-drop firewall that coexists with Docker, use ct original proto-dst in a forward-hook chain to explicitly allow only the published ports you intend to expose. This makes your ruleset independent of Docker's internal IP assignments, which change every time containers restart.
Fix 3: Inject Rules into Docker's Own Chains
Instead of racing Docker on priority, you can insert rules directly into Docker's nftables chains. The nft insert command places a rule at the top of a chain, before Docker's own accept rules.
# Insert a drop rule at the top of Docker's forward chain # Use 'nft list ruleset' first to confirm Docker's exact table/chain names $ sudo nft insert rule ip docker-bridges forward tcp dport 3000 drop # Block by bridge interface name (per-compose-stack filtering) $ sudo nft insert rule ip docker-bridges forward \ iifname "br-a1b2c3d4e5f6" tcp dport 8888 drop # Add logging before dropping $ sudo nft insert rule ip docker-bridges forward \ tcp dport 3000 log prefix "DOCKER-BLOCK: " drop
This approach has a significant drawback: Docker recreates its nftables rules every time the daemon restarts. Your manually inserted rules vanish. Docker's documentation explicitly warns against modifying its tables directly, stating that Docker expects full ownership of its tables. If you use this method, you need a systemd service that re-injects your rules after every Docker restart.
[Unit] Description=Re-inject nftables rules into Docker's chains After=docker.service Requires=docker.service [Service] Type=oneshot ExecStart=/usr/local/bin/docker-firewall-rules.sh RemainAfterExit=yes [Install] WantedBy=multi-user.target
The corresponding script needs to wait for Docker's nftables table to exist before inserting rules. Docker does not create its tables synchronously with the systemd unit becoming active -- there is a race window of a few seconds. A polling loop that checks for the table is the practical workaround:
#!/bin/bash set -e # Wait for Docker's nftables table to exist max_attempts=30 attempt=0 while ! nft list table ip docker-bridges >/dev/null 2>&1; do attempt=$((attempt + 1)) if [ $attempt -ge $max_attempts ]; then echo "Docker nftables table not found after $max_attempts attempts" exit 1 fi sleep 1 done # Insert drop rules at the top of Docker's forward chain nft insert rule ip docker-bridges forward tcp dport 3000 drop nft insert rule ip docker-bridges forward tcp dport 5432 drop echo "Docker firewall rules applied"
Docker considers its own nftables tables to be fully managed. Modifying them directly creates a fragile setup where a Docker daemon restart, upgrade, or network reconfiguration silently removes your security rules. The separate-table approach (Fix 1 or Fix 2) is more durable for production environments.
Fix 4: Firewall Marks with --bridge-accept-fwmark
Docker 29's nftables backend introduced a mechanism that almost no one covers: the --bridge-accept-fwmark daemon option. This is Docker's official replacement for the DOCKER-USER chain in nftables mode, and it works through Netfilter packet marking rather than chain priority ordering.
The concept is specific: in nftables, an accept verdict in one chain is not final -- the packet continues to other chains at the same hook, any of which can still drop it. Docker's nftables chains include drop rules for traffic that should not reach containers. To override a Docker drop rule without modifying Docker's tables, you mark the packet with a firewall mark (fwmark) in your own chain, and Docker's chain checks for that mark and accepts the packet instead of dropping it.
{
"firewall-backend": "nftables",
"bridge-accept-fwmark": 1
}
With this configuration, Docker's nftables chains will accept any packet carrying fwmark value 1. You then create your own chain at a priority lower than Docker's and selectively apply the mark to traffic you want to allow:
table inet my-docker-policy { chain forward_mark { # Must run before Docker's chain at -100 to set the mark in time type filter hook forward priority -150; policy accept; # Mark allowed traffic with fwmark 1 iifname "eth0" ct original proto-dst 443 meta mark set 1 iifname "eth0" ct original proto-dst 80 meta mark set 1 # Traffic without the mark will be dropped by Docker's chain } }
The --bridge-accept-fwmark option also accepts a mask for bit-level matching: "bridge-accept-fwmark": "0x1/0x3" matches only specific bits in the mark value, which is useful when other systems (such as traffic shaping or policy routing) also use fwmark bits. This is the cleanest integration point Docker provides for nftables-native firewall policies, and it avoids the priority race entirely.
Building a Default-Drop Firewall with Docker
Many administrators want a default-drop posture: block everything unless explicitly allowed. This is straightforward for the input hook, but Docker complicates the forward hook. The challenge is that Docker needs its own forwarding rules to function -- container-to-container traffic, container-to-internet masquerading, and inbound port publishing all depend on forwarding being allowed for specific flows.
One approach is to use a separate table with a default-drop policy on the forward hook at a lower priority than Docker, and then explicitly allow only the Docker traffic you want:
table inet firewall { chain forward { type filter hook forward priority -200; policy drop; # Drop invalid packets early ct state invalid drop # Allow established and related connections ct state established,related accept # Allow containers to reach the internet (outbound) iifname "docker0" accept iifname "br-*" accept # Allow inbound traffic to published port 443 only iifname "eth0" ip daddr 172.17.0.0/12 ct original proto-dst 443 accept # Allow inbound to port 80 iifname "eth0" ip daddr 172.17.0.0/12 ct original proto-dst 80 accept # Everything else forwarded from external interfaces is dropped } }
With this configuration, only ports 80 and 443 are reachable from outside the host, regardless of what ports Docker publishes. If someone accidentally deploys a container with -p 5432:5432, the firewall blocks external access. This is a critical defense-in-depth measure: it protects against both accidental exposure and the scenario where an attacker compromises a CI/CD pipeline and deploys a container with attacker-controlled port mappings.
For environments subject to compliance requirements (PCI DSS, HIPAA, SOC 2), a default-drop forward policy with explicit allowlisting is often the only posture that satisfies auditors. Document your nftables rules alongside your Docker Compose files so that the relationship between published ports and firewall rules is auditable. Consider adding nftables counters to your rules (counter accept / counter drop) so you can demonstrate to auditors which ports are receiving traffic and which are being blocked -- run nft list ruleset periodically and archive the counter output.
A default-drop policy on the forward chain at a priority lower than Docker's will block all forwarded traffic unless you explicitly allow it. This includes inter-container communication on different bridge networks and traffic between Docker networks you may have created with docker network create. Test thoroughly in a staging environment before deploying to production.
Debugging: Tracing Packets Through the Hooks
When your rules are not working as expected, nftables packet tracing is invaluable. You can enable tracing on specific packets and watch them traverse every chain and rule in the pipeline.
# Enable tracing for packets to port 3000 $ sudo nft add rule inet docker-filter forward_early \ tcp dport 3000 meta nftrace set 1 # In another terminal, watch the trace output $ sudo nft monitor trace # Trigger traffic to port 3000 from another machine # The trace shows every chain and rule the packet hits, in order
The trace output shows the exact chain evaluation order: which table and chain processed the packet, what priority that chain has, and whether the verdict was accept, drop, or continue. This tells you definitively whether Docker's chain is accepting the packet before your drop rule fires.
For ongoing monitoring in production, logging is more practical than tracing:
# Log and drop traffic to port 3000 $ sudo nft add rule inet docker-filter forward_early \ tcp dport 3000 log prefix "DOCKER-BLOCK-3000: " drop # Watch the logs $ sudo journalctl -k -f | grep DOCKER-BLOCK
Another useful diagnostic is simply listing the full ruleset to see every table, chain, and priority Docker has created:
Look for tables named docker, docker-bridges, or anything in the ip filter / ip nat tables that was created by the iptables-nft compatibility layer. Pay close attention to the priority values on each base chain.
Persisting Rules Across Reboots and Docker Restarts
nftables rules loaded with nft -f do not survive a reboot by default. The standard approach is to save your rules in /etc/nftables.conf (or in drop-in files under /etc/nftables.d/) and ensure the nftables systemd service is enabled.
Never use flush ruleset in your nftables configuration file when Docker is running. This command destroys every nftables table on the system, including Docker's own tables. Docker does not detect that its rules have been removed and will not recreate them until the daemon restarts. The result is that all container networking silently breaks -- port publishing, inter-container communication, and outbound masquerading all stop working with no error messages. Instead, flush only your own tables: flush table inet docker-filter or delete table inet docker-filter before reloading your rules.
Many nftables tutorials -- including examples on the nftables wiki itself -- begin their configuration files with flush ruleset. This is perfectly safe on systems that are not running Docker or other software that creates its own nftables tables. The problem is specific to Docker (and similar tools like Kubernetes, libvirt, or Podman) that dynamically manage their own nftables tables and expect those tables to persist. If you see a guide that recommends flush ruleset at the top of /etc/nftables.conf, it is not wrong in the general case -- it is wrong specifically when Docker is running. We recommend flush table inet your-table-name as the safe alternative because it limits the destruction to your own rules and leaves Docker's networking intact.
# Enable the nftables service to load rules at boot $ sudo systemctl enable nftables # Verify your rules file is valid before saving $ sudo nft -c -f /etc/nftables.conf # Reload the ruleset without rebooting $ sudo systemctl reload nftables
There is a timing issue to be aware of. Docker recreates its own nftables tables when the daemon starts. If Docker starts before your nftables rules are loaded, Docker's rules may reference chains or tables that do not exist yet. If your nftables rules load first and Docker starts later, Docker will add its own tables alongside yours. The safest approach is to use separate tables (as shown in the examples above) so that your rules and Docker's rules are completely independent and load order does not matter.
After any Docker daemon restart or upgrade, run nft list ruleset and verify that your custom tables and chains still exist with the correct priority values. Docker does not touch tables it did not create, so separate tables survive Docker restarts -- but it is worth confirming.
The Simplest Fix: Bind to Localhost
Before investing effort in complex nftables rulesets, consider whether you can avoid the problem entirely. If a container's port should only be accessible from the host itself -- for example, a database that a reverse proxy connects to -- bind it to the loopback interface:
# Only accessible from the host, not from the network $ docker run -d -p 127.0.0.1:5432:5432 postgres:16 # In docker-compose.yml: services: postgres: image: postgres:16-alpine ports: - "127.0.0.1:5432:5432" redis: image: redis:7-alpine ports: - "127.0.0.1:6379:6379"
When you bind to 127.0.0.1, Docker creates DNAT rules that only match packets originating from the loopback interface. External traffic never matches, so it never gets forwarded to the container. No nftables surgery required.
The Userland Proxy and route_localnet
There is a subtlety here that even experienced administrators overlook. Docker has two mechanisms for port forwarding: the userland proxy (docker-proxy) and hairpin NAT via kernel iptables/nftables rules. By default, Docker uses the userland proxy -- a separate process per published port that copies bytes between sockets in userspace. You can see these processes with ps aux | grep docker-proxy. Each one consumes a file descriptor, memory, and CPU time for every connection it handles. Performance benchmarks show that the userland proxy yields roughly 42 Gbps throughput on a 10-connection iperf3 test, compared to 54 Gbps with hairpin NAT and 70 Gbps with host networking -- the proxy costs approximately 30% of peak throughput.
Setting "userland-proxy": false in daemon.json disables the proxy and switches to hairpin NAT. However, this triggers a security-relevant kernel sysctl change: Docker sets net.ipv4.conf.docker0.route_localnet=1 on each bridge interface. This sysctl disables the kernel's martian packet filtering for 127.0.0.0/8 addresses on that interface, which is normally a security boundary. A related Kubernetes vulnerability (CVE-2020-8558) demonstrated that when route_localnet=1 is set on a LAN-facing interface, neighboring hosts on the same Layer 2 network can route traffic to 127.0.0.1 on the Docker host, potentially reaching localhost-bound services. Docker mitigates this by only setting route_localnet=1 on its own bridge interfaces (not the host's external interfaces), but it is worth understanding that this sysctl change exists and persists even if you switch back to "userland-proxy": true -- Docker does not revert the sysctl until the host reboots.
The CIS Docker Benchmark (control 2.15) recommends disabling the userland proxy where hairpin NAT is available, citing reduced attack surface. If you disable it, verify that route_localnet is only set on Docker bridge interfaces and not on your host's external-facing interfaces by running sysctl -a | grep route_localnet.
For services that should be exposed to the internet, place them behind a reverse proxy (Nginx, Caddy, HAProxy) running on the host. The proxy listens on the public interface and forwards to the localhost-bound container. This pattern gives you full control over TLS termination, rate limiting, and access control at the proxy layer, without fighting Docker's firewall management.
Never mount the Docker socket (/var/run/docker.sock) into a container in production. A compromised container with socket access can create new privileged containers, mount the host root filesystem, and achieve full host compromise -- this is the primary container escape vector documented by MITRE ATT&CK (T1611, Escape to Host). If you need container builds inside containers, use rootless build tools like Kaniko or Buildah instead of Docker-in-Docker with socket mounting.
What About Disabling Docker's Firewall Management?
Docker's daemon configuration accepts "iptables": false (which also applies to the nftables backend despite the name). This prevents Docker from creating firewall rules, giving you complete control. However, Docker's documentation warns that this will break container networking for many users.
This option is not appropriate for most users, it is likely to break networking.
-- Docker Documentation, Packet filtering and firewalls
Without Docker's rules, the following stop working: port publishing (-p flag) produces no DNAT rules, so published ports are unreachable; outbound masquerading is gone, so containers cannot reach the internet; and network isolation between Docker networks is not enforced.
If you disable Docker's firewall management, you must manually create all of the NAT, forwarding, and masquerading rules yourself. This is an option for teams that manage large-scale container infrastructure and have dedicated networking expertise, but it is not practical for the majority of deployments.
A number of popular tutorials -- particularly those written before Docker 29's native nftables backend existed -- recommend setting "iptables": false and manually managing all firewall rules as the "correct" way to use Docker with nftables. Some even recommend switching all containers to host networking mode to eliminate the firewall interaction entirely. While this approach does give you total control, it also means you are responsible for replicating all of Docker's NAT, masquerading, isolation, and forwarding rules -- a substantial undertaking that is easy to get wrong. With Docker 29's native nftables backend and the --bridge-accept-fwmark option, there are now cleaner integration paths that let Docker manage its own networking while giving you explicit control over access policies. We recommend working with Docker's firewall management using the priority-chain or fwmark approaches described in this guide, rather than disabling it entirely.
Docker with UFW and firewalld
The bypass problem is not unique to hand-written nftables rules. UFW (Uncomplicated Firewall) and firewalld both suffer from the same issue. Docker routes container traffic through the nat table, which processes packets before UFW's INPUT and OUTPUT chains see them. Administrators using UFW frequently discover that their carefully configured deny rules have no effect on Docker-published ports.
firewalld has slightly better integration. Docker creates a firewalld zone called docker with target ACCEPT, and inserts all Docker bridge interfaces into that zone. It also creates a forwarding policy called docker-forwarding that allows traffic from any zone to the docker zone. This means Docker-published ports bypass zone-based restrictions the same way they bypass nftables filter rules.
The fixes are the same regardless of which frontend you use: either control access at the chain-priority level in nftables, use conntrack to match original destinations, bind sensitive ports to localhost, or place services behind a reverse proxy.
The IPv6 Dimension
Everything discussed so far focuses on IPv4, but Docker also manages IPv6 firewall rules -- and the IPv6 story is different in important ways. With the native nftables backend, Docker creates a separate ip6 docker-bridges table alongside the IPv4 ip docker-bridges table. If your custom filtering chain uses the inet family (which handles both IPv4 and IPv6 in a single ruleset), your rules automatically cover both protocols. If you mistakenly create your chain in the ip family only, IPv6 traffic to containers passes through unfiltered.
The bigger issue is that IPv6 with Docker can operate without NAT at all. When Docker is configured with routed IPv6 prefixes using gateway_mode_ipv6: "routed" in the network driver options, containers receive globally routable IPv6 addresses. There is no DNAT rewriting in the prerouting hook because the containers are directly addressable. This means your filter rules in the forward hook see the original destination addresses and ports -- the DNAT bypass problem does not apply to routed IPv6. However, it also means that every port a container listens on is directly reachable from the internet over IPv6 unless you explicitly block it. Many administrators who carefully locked down IPv4 access discover that their containers have been globally reachable over IPv6 the entire time.
If your host has a globally routable IPv6 prefix and Docker containers receive addresses from that prefix, every container port is reachable from the internet over IPv6 by default -- regardless of your IPv4 firewall rules. Add explicit nftables rules in the forward hook to filter IPv6 traffic to Docker bridge interfaces, and include ICMPv6 rules that allow essential neighbor discovery and path MTU messages while blocking everything else.
Docker Compose and Per-Stack Filtering
Docker Compose creates a dedicated bridge network for each stack, named something like br-a1b2c3d4e5f6. These bridge interface names are generated from the network ID and change every time you docker compose down && docker compose up. This makes it impractical to write nftables rules that reference specific bridge names if your Compose stacks are recreated frequently.
There are two approaches to handle this. The first is to use ct original proto-dst matching (Fix 2), which is independent of interface names and matches on the published host port regardless of which bridge network the container sits on. The second is to create named Docker networks with predictable bridge names by setting the com.docker.network.bridge.name driver option in your Compose file:
networks: appnet: driver: bridge driver_opts: com.docker.network.bridge.name: br-myapp services: web: image: nginx:alpine ports: - "8080:80" networks: - appnet
With a stable bridge name like br-myapp, you can write nftables rules that target traffic to or from that specific Compose stack by interface name. This is especially useful in multi-tenant environments where different stacks have different access policies.
The firewall backend configuration ("firewall-backend": "nftables" in daemon.json) is a daemon-level setting. Docker Compose files require no changes when switching between iptables and nftables backends. Your existing Compose configurations work identically with either backend -- the difference is entirely in how Docker creates its kernel-level packet filtering rules.
Rate Limiting Published Ports with nftables Meters
A defensive technique that almost no Docker-nftables guide covers is using nftables meters (formerly called dynamic sets) to rate-limit connections to published container ports. Unlike static drop rules, meters track per-source-IP connection rates in kernel memory and only trigger when a threshold is exceeded. This gives you brute-force protection at the firewall level, before traffic ever reaches the container application.
table inet docker-ratelimit { chain forward_limit { type filter hook forward priority -199; policy accept; # Allow established connections without rate checking ct state established,related accept # Rate limit new connections to published port 8080: max 25/second per source IP iifname "eth0" ct original proto-dst 8080 ct state new \ meter ratelimit-8080 { ip saddr limit rate over 25/second burst 50 packets } \ counter drop # Rate limit SSH-forwarded port: max 5/minute per source IP iifname "eth0" ct original proto-dst 2222 ct state new \ meter ratelimit-ssh { ip saddr limit rate over 5/minute burst 10 packets } \ counter drop # Log rate-limited connections for monitoring iifname "eth0" ct original proto-dst 8080 ct state new \ meter ratelimit-8080-log { ip saddr limit rate over 25/second } \ log prefix "DOCKER-RATELIMIT-8080: " counter } }
Meters are more efficient than conntrack-based rate limiting because they use nftables' native set infrastructure with per-element timeouts. Each meter entry automatically expires after the evaluation window, so there is no manual cleanup. You can inspect active meter entries with nft list meter inet docker-ratelimit ratelimit-8080 to see which source IPs are currently being rate-limited. For environments running API containers that face the internet directly, this is a critical layer of defense against volumetric attacks that would otherwise overwhelm container-level rate limiters like Nginx's limit_req.
One operational detail: meters consume kernel memory proportional to the number of tracked source IPs. Each entry is approximately 64 bytes. A meter tracking 100,000 unique source IPs uses roughly 6.4 MB of kernel memory -- negligible on modern hosts but worth monitoring if you are under sustained DDoS with millions of spoofed sources. The default meter size is unlimited; to cap it, add size 65535 after the meter name.
Swarm Mode and Overlay Networks
Docker's nftables backend has one significant limitation as of Docker Engine 29: overlay networks are not supported. The Docker documentation states that nftables support is experimental, and overlay network rules have not been migrated from iptables. This means you cannot enable the nftables backend when the Docker daemon is running in Swarm mode. If your infrastructure uses Docker Swarm for orchestration, you are limited to the iptables backend and the DOCKER-USER chain approach for firewall filtering.
For environments that need both Swarm and strict firewall control, the practical path is to use the iptables backend with DOCKER-USER rules for now, and plan a migration to the nftables backend once overlay network support leaves experimental status. If you are running standalone Docker hosts or using Kubernetes for orchestration (which has its own network policy framework), the nftables backend is a viable choice today.
Verifying Your Rules From Outside
The single most important step after configuring any Docker firewall rules is external verification. Running nft list ruleset on the host tells you what rules exist, but it does not tell you whether they are effective. The only reliable test is to generate traffic from an external machine and observe the result.
# From an external machine -- scan specific ports $ nmap -p 80,443,5432,8080,3000 your-docker-host # Check if a specific port responds to a TCP handshake $ nc -zv your-docker-host 5432 # On the Docker host -- watch for packets hitting your drop rules $ sudo tcpdump -i eth0 port 5432 -nn # Check which ports are actually listening on all interfaces $ sudo ss -tlnp | grep -v 127.0.0.1
The ss -tlnp command is particularly revealing. If it shows a container process listening on 0.0.0.0:5432, that port is bound to all interfaces and is a candidate for firewall filtering. If it shows 127.0.0.1:5432, the port is already protected by the localhost binding and does not require nftables rules. Run this check after every deployment to identify which processes are using exposed ports before an attacker does.
Wrapping Up
docker info --format '{{.ServerVersion}}' and check /etc/docker/daemon.json for "firewall-backend": "nftables". If it is not there, you are on iptables.-m conntrack --ctorigdstport to match on original host ports. This is Docker's officially supported mechanism and survives daemon restarts."bridge-accept-fwmark": 1 in daemon.json. Create a chain at priority -1 that marks allowed traffic with meta mark set 1. Docker's chain accepts marked packets and drops the rest. This avoids the priority race and is Docker's official nftables integration point.policy drop. Explicitly allow established connections, container outbound, and only the published ports you want reachable. Use ct original proto-dst for port matching. This is the strictest posture and is recommended for compliance environments.ct original proto-dst in a priority -200 chain to match on the published host port. This is independent of container IPs and bridge names, which change on every docker compose down/up. Combine with named bridge networks (com.docker.network.bridge.name) for per-stack filtering when needed.policy accept. Add specific drop rules for the ports you want blocked. Simple, stable, and your rules survive Docker restarts because they live in a separate table.Docker's firewall bypass is not a bug -- it is a consequence of how NAT and Netfilter hooks interact. Docker uses DNAT in the prerouting hook to redirect traffic to containers, which changes packet destinations before your filter rules ever see them. With the native nftables backend, Docker creates chains at priority -100 that accept container traffic before standard filter chains at priority 0 can intervene.
The practical solutions come down to a few approaches: create nftables chains at a priority lower than Docker's (such as -200) and place your drop rules there; use conntrack's original destination matching to filter on pre-DNAT port numbers; bind sensitive services to localhost and use a reverse proxy for public-facing containers; or, in the iptables backend, use the DOCKER-USER chain that Docker provides specifically for this purpose.
Whichever approach you choose, verify your rules from outside the host. A port scan from an external machine is the only reliable confirmation that your firewall is doing what you think it is doing.
From a security perspective, treat Docker's firewall management as an untrusted component in your defense-in-depth stack. Docker's job is to make containers work; your job is to make sure they work only as intended. Keep the Docker Engine patched (the CVE-2026-34040 authorization bypass and earlier CVE-2024-41110 demonstrate that Docker's own security boundaries are regularly challenged), restrict Docker API access to Unix sockets with tight permissions, run periodic external port scans against your hosts, and audit your nftables ruleset after every Docker upgrade. The intersection of container orchestration and host-level firewalling is where security assumptions break down -- and where attackers look first.
How to Prevent Docker from Bypassing nftables Rules
Step 1: Inspect Docker's nftables chains and priorities
Run nft list ruleset to see every table, chain, and priority value Docker has created. Identify the priority used by Docker's forward chain (commonly -100 with the native nftables backend) so you know what value your custom chain must beat.
Step 2: Create a custom forward chain with a lower priority
Create a new nftables table and add a base chain on the forward hook with a priority lower than Docker's. For example, use priority -200 if Docker uses -100. This ensures your chain is evaluated before Docker's accept rules.
Step 3: Add drop or accept rules in the custom chain
Insert rules in your custom forward chain that drop traffic you want to block or accept traffic you want to allow. Use ct original proto-dst to match on pre-DNAT destination ports when filtering traffic bound for Docker containers.
Step 4: Persist the ruleset and verify after Docker restarts
Save your nftables configuration to /etc/nftables.conf or a drop-in file so it loads at boot. Docker recreates its own rules every time the daemon restarts, so verify your custom chains still have correct priority ordering by running nft list ruleset after each Docker restart.
Frequently Asked Questions
Why does Docker bypass my nftables firewall rules?
Docker bypasses nftables rules because it creates its own tables and chains with specific priority values. When Docker publishes a port, it inserts DNAT rules in the prerouting hook that rewrite packet destinations before your filter chains ever see the traffic. Additionally, Docker's forward chain often runs at a lower numeric priority (such as -100) than your standard filter chain (priority 0), meaning Docker's accept verdict executes first.
How do I block a Docker-published port using nftables?
To block a Docker-published port with nftables, create a custom chain on the forward hook with a lower numeric priority than Docker uses. For example, if Docker uses priority -100, set your chain to priority -200. Then add drop rules in that chain targeting the ports you want to block. Alternatively, use the conntrack original destination match (ct original proto-dst) to filter based on the pre-DNAT port number.
What is the difference between the iptables DOCKER-USER chain and the nftables approach?
With the iptables backend, Docker provides a DOCKER-USER chain where administrators can insert custom rules that run before Docker's own accept rules. With Docker's native nftables backend (introduced experimentally in Docker 29), there is no DOCKER-USER chain. Instead, administrators must create separate nftables tables with base chains that use appropriate priority values to execute before Docker's chains.
Can I use conntrack to filter Docker traffic based on original destination ports?
Yes. Because Docker's DNAT rewrites the destination address and port before the forward hook, you can use conntrack's original destination fields to match on the pre-DNAT values. In nftables, use the expression ct original proto-dst to match the port number that the client originally targeted. This lets you write forward-hook rules that reference published host ports rather than internal container ports.
Does disabling Docker's iptables or nftables management fix the bypass problem?
Setting iptables to false in the Docker daemon configuration prevents Docker from creating firewall rules, but this is not recommended for many environments. Without Docker's rules, port publishing, outbound masquerading, and network isolation all stop working. You would need to manually recreate all the NAT, forwarding, and isolation rules yourself, which is complex and error-prone.
Does Docker's nftables firewall bypass also affect IPv6 traffic?
It depends on your IPv6 configuration. With NAT-based IPv6 (the default), the same DNAT bypass applies to IPv6 just as it does to IPv4. With routed IPv6 prefixes, there is no DNAT -- containers receive globally routable addresses and every port is directly reachable from the internet. In routed mode, the bypass problem does not apply, but the exposure risk is far greater because there is no NAT layer providing implicit isolation. You must add explicit nftables forward-hook rules to filter IPv6 traffic to container subnets.
Can I use Docker's nftables backend with Swarm mode?
No. As of Docker Engine 29, the native nftables backend does not support overlay networks. Docker's documentation explicitly states that nftables cannot be enabled when the daemon is running in Swarm mode. Swarm deployments must use the iptables backend and the DOCKER-USER chain for firewall filtering until overlay network support is migrated to nftables.
Sources and References
Technical details in this guide are drawn from official documentation and verified sources.
- Docker Documentation - Firewall with nftables -- native nftables backend configuration, chain priorities, DOCKER-USER migration, and IP forwarding behavior
- Docker Documentation - Packet filtering and firewalls -- DOCKER-USER chain, iptables management, firewalld and UFW interaction, ip-forward-no-drop option
- Netfilter Project - nftables wiki -- chain priority values, hook evaluation order, conntrack expressions
- Docker Engine Issue #50566 -- nftables filter-FORWARD policy reset behavior on daemon restart
- dzx.fr - Nftables, Docker, and a default drop policy -- Netfilter accept/drop verdict semantics across multiple chains
- Cyera Research - CVE-2026-34040 Docker Authorization Bypass -- oversized HTTP body AuthZ bypass, exploit timeline, and remediation
- MITRE ATT&CK - Containers Matrix -- container-specific tactics and techniques including T1059.013 (Container CLI/API) and T1611 (Escape to Host)
- Docker Documentation - Docker with iptables -- DOCKER-USER chain traversal order, DOCKER-ISOLATION chains, conntrack extension for original destination matching
- MITRE ATT&CK - T1059.013 Container CLI/API -- adversary abuse of Docker CLI and REST API for execution, lateral movement, and container escape