Docker Daemon Networking Architecture

Every container you start gets a network stack -- its own interfaces, routing table, and firewall rules -- and something has to build that stack before your process sees its first packet. That something is the Docker daemon. Understanding how dockerd constructs and manages networking is essential for anyone running containers in production, because the moment something goes wrong -- a subnet collision, a firewall rule that silently eats traffic, an overlay tunnel that drops packets between hosts -- you need to know exactly which layer owns the problem.

This article walks through the daemon's networking architecture from the ground up: the Container Network Model that structures everything, the Linux primitives the daemon uses under the hood, the built-in network drivers and how they differ, the iptables chains Docker creates, and the daemon.json knobs you can turn to change the defaults. If you want a broader orientation to Docker networking without the guesswork before getting into daemon internals, that guide covers the conceptual layer.

The Container Network Model

Docker's networking is built on the Container Network Model (CNM), an abstraction that separates the concerns of network provisioning from the container runtime itself. The CNM defines three building blocks that every Docker network uses, regardless of the underlying driver.

A sandbox is an isolated network namespace. Each container gets its own sandbox, which contains the container's interfaces, routing table, and DNS configuration. The sandbox ensures that one container's network state cannot interfere with another's. A network is a group of endpoints that can communicate directly with each other. Networks are implemented by drivers -- bridge, overlay, macvlan, and so on. An endpoint is a virtual network interface that connects a sandbox to a network. A container can have multiple endpoints, each joining a different network.

This three-layer design is what allows a single container to be connected to a private internal bridge and a routable macvlan network simultaneously. The daemon creates and manages all three components through its network subsystem, originally implemented as a separate open-source library called libnetwork. As the official libnetwork design document states, it implements the Container Network Model to provide a native Go implementation for connecting containers, with the goal of delivering a robust networking abstraction that works consistently across drivers. The libnetwork repository was later merged back into the main Moby monorepo under moby/moby/libnetwork, where it continues to receive active development alongside the rest of the Docker Engine codebase. Go consumers import from github.com/moby/moby; the path is a source tree location, not a standalone module.

Under the Hood

The NetworkController object that orchestrates all of this lives inside the dockerd process itself — not in a separate service. When you run docker network create with the bridge driver, libnetwork creates a Network object, records it in its state database, and provisions the Linux bridge interface immediately — you can confirm this with ip link show type bridge right after the command returns. However, no veth pairs, no iptables rules, and no container-specific kernel state exist yet. The driver only creates those resources when CreateEndpoint is subsequently called — that is, when a container actually joins the network. This two-phase model is why docker network create is nearly instantaneous, while the first docker run --network on a new network takes slightly longer.

Note

The CNM is Docker's networking model. Kubernetes uses a different model called CNI (Container Network Interface). If you run Docker under Kubernetes, the CNI plugin -- not Docker's built-in drivers -- handles pod networking. Docker's network drivers remain relevant for standalone Docker hosts and Docker Compose deployments.

Linux Primitives Under the Hood

Docker does not implement its own TCP/IP stack. It orchestrates existing Linux kernel features to build container networks. Understanding which primitives are in play makes troubleshooting far more tractable.

Network namespaces provide the isolation. Each container runs in its own network namespace, which gives it a private set of interfaces, a private routing table, and private iptables rules. The host namespace and the container namespace are separate worlds, connected only by the interfaces Docker creates. Docker deliberately stores container namespace references under /var/run/docker/netns/ rather than the conventional /var/run/netns/ path that ip netns list reads. This is why running ip netns list on a host with dozens of containers shows nothing. To inspect a container's namespace directly, either symlink the paths or use nsenter --net=$(docker inspect --format '{{.NetworkSettings.SandboxKey}}' <container>).

Virtual Ethernet pairs (veth) are the connectors. A veth pair acts like a virtual cable with two ends -- one end lives in the container's namespace (typically named eth0 inside the container), and the other end lives in the host namespace and is plugged into a bridge or directly into a parent interface. Every bridge-connected container has one veth pair. One detail that surprises many engineers: on the common fast path, transmitting a frame across a veth pair involves no memory copy. The kernel performs a pointer swap between the two ends — the sk_buff pointer is handed off across the namespace boundary rather than duplicated. This is why veth latency is measured in hundreds of nanoseconds rather than microseconds.

Linux bridges act as virtual switches. When Docker creates a bridge network, it creates a Linux bridge interface (the default one is named docker0) and attaches the host-side ends of the veth pairs to it. Containers on the same bridge can communicate at layer 2 because the bridge forwards frames between its attached ports. Docker disables Spanning Tree Protocol (STP) on every bridge it creates. STP is designed to prevent forwarding loops in networks with redundant physical links, but at default timers it introduces a 30-second port state transition delay (15 seconds in listening state, then 15 seconds in learning state, before forwarding begins) that would make container startup unacceptably slow. Since Docker's bridges are point-to-point topologies with no physical redundancy, loop prevention is unnecessary and STP is off by default. You can verify this with ip link show docker0 or, on systems where the legacy brctl utility is still installed, brctl show docker0 — either will confirm STP is disabled.

iptables / nftables handle NAT and firewall rules. Docker inserts rules to masquerade outbound container traffic (so it appears to come from the host's IP) and to forward inbound traffic to published ports. These rules are what make -p 8080:80 work.

VXLAN tunnels underpin overlay networks. When containers on different hosts need to communicate, Docker encapsulates their layer 2 frames inside UDP packets (port 4789 by default) and sends them across the physical network. The receiving host decapsulates the frame and delivers it to the destination container's namespace.

Pro Tip

You can inspect the veth pairs connecting containers to bridges by running ip link show on the host. Each vethXXXXXX interface is the host-side end of a container's virtual cable. To find which container owns it, compare the interface index shown inside the container (ip link show eth0) with the peer index on the host.

packet journey: container → internet (click a layer) interactive

select a layer above to trace the packet

Each layer transforms or forwards the packet differently. Click any box to see exactly what happens at that hop.

Bridge Networks

The bridge driver is Docker's default and the one you will use for the vast majority of single-host deployments. When the Docker daemon starts for the first time, it creates a built-in network called bridge that uses a Linux bridge interface named docker0. Any container started without an explicit --network flag attaches to this default bridge.

Default Bridge vs. User-Defined Bridge

The default bridge and user-defined bridges behave differently in several important ways. The default bridge does not provide automatic DNS resolution between containers. If container A wants to reach container B on the default bridge, it must use B's IP address or rely on the deprecated --link flag. User-defined bridges run Docker's embedded DNS server at 127.0.0.11, which resolves container names and aliases automatically.

User-defined bridges also provide better network isolation. Containers on different user-defined networks cannot communicate unless you explicitly connect them to a shared network. On the default bridge, all containers share the same broadcast domain with no additional segmentation.

creating and using a user-defined bridge

# Create a bridge network with an explicit subnet
$ docker network create \
    --driver bridge \
    --subnet 10.20.0.0/24 \
    --gateway 10.20.0.1 \
    app-net

# Run two containers on the same network
$ docker run -d --name api --network app-net nginx
$ docker run -d --name worker --network app-net alpine sleep 3600

# DNS resolution works automatically
$ docker exec worker ping -c 2 api
PING api (10.20.0.2): 56 data bytes
64 bytes from 10.20.0.2: seq=0 ttl=64 time=0.087 ms

What breaks if…

You leave containers on the default docker0 bridge and try to scale to a second service? Container A has no way to resolve “api” by name — it must hard-code an IP that changes every restart. Compose dependency ordering stops working correctly. Adding a second host later means rewriting all your service discovery logic. User-defined bridges are not a style preference; they are the architectural prerequisite for everything that follows.

How the Daemon Allocates Subnets

When you create a network without specifying --subnet, the daemon selects an available range from its default address pools. These pools are configured in /etc/docker/daemon.json and default to two ranges: 172.16.0.0/12 (from which Docker allocates /16 subnets, covering 172.16.0.0 through 172.31.255.255) and 192.168.0.0/16 (from which Docker allocates /20 subnets). If these ranges overlap with your corporate network or VPN subnets -- a common problem in enterprise environments -- containers will experience routing conflicts.

/etc/docker/daemon.json

// Override default address pools to avoid conflicts
{
  "default-address-pools": [
    {
      "base": "10.200.0.0/16",
      "size": 24
    },
    {
      "base": "10.201.0.0/16",
      "size": 24
    }
  ]
}

After editing daemon.json, restart the daemon with systemctl restart docker. Only newly created networks will use the updated pools -- existing networks retain their original subnets.

Warning

Restarting the Docker daemon will briefly interrupt all running containers unless you have "live-restore": true in your daemon.json. On production hosts, plan daemon restarts during maintenance windows or use live restore to keep containers running through the restart.

Iptables and Firewall Rules

How the Docker daemon interacts with the host's firewall is something many administrators encounter without fully understanding at first. Docker inserts iptables rules automatically for bridge networks, and these rules have significant implications for host security.

What Docker Creates

When Docker starts and creates a bridge network, it inserts rules into both the nat and filter tables of iptables. In the nat table, Docker creates a chain named DOCKER and adds rules for masquerading (source NAT) and port mapping (destination NAT). In the filter table, Docker creates several custom chains: DOCKER-USER, DOCKER-FORWARD, DOCKER, and DOCKER-INGRESS. The DOCKER-INGRESS chain handles Swarm routing mesh rules. Note that if you are running a pre-29 engine, you will also see DOCKER-ISOLATION-STAGE-1 and DOCKER-ISOLATION-STAGE-2 chains — these were removed in Docker Engine 29 (moby/moby#49981) and their inter-network isolation logic was folded into the updated DOCKER-FORWARD chain.

When using the iptables backend, the daemon also enables IP forwarding on the host by setting net.ipv4.ip_forward = 1 and net.ipv6.conf.all.forwarding = 1. When it does this, it sets the default policy of the iptables FORWARD chain to DROP, meaning all forwarded traffic is blocked unless an explicit rule allows it. Docker then adds its own accept rules for container traffic. The nftables backend does not enable IP forwarding automatically — it will report an error at daemon startup if forwarding is not already enabled on the host. In either case, Docker's enabling of ip_forward is not written to /etc/sysctl.d/ and does not survive a reboot without a persistent sysctl entry. The broader landscape of Linux kernel tuning for high-traffic servers covers the full set of sysctl parameters worth understanding in production environments.

inspecting Docker's nat rules

$ sudo iptables -t nat -L -n

Chain PREROUTING (policy ACCEPT)
target     prot  opt  in   out  source       destination
DOCKER     all   --   *    *    0.0.0.0/0    0.0.0.0/0    ADDRTYPE match dst-type LOCAL

Chain POSTROUTING (policy ACCEPT)
target       prot  opt  in   out      source          destination
MASQUERADE  all   --   *    !docker0  172.17.0.0/16   0.0.0.0/0

Chain DOCKER (2 references)
target  prot  opt  in       out  source       destination
RETURN  all   --   docker0  *    0.0.0.0/0    0.0.0.0/0

The MASQUERADE rule in the POSTROUTING chain is what gives containers internet access. It rewrites the source IP of outbound packets from the container's private address to the host's address. The DOCKER chain in PREROUTING handles inbound port mapping -- when you publish a port with -p 8080:80, a DNAT rule here rewrites the destination to the container's internal IP and port.

What breaks if…

You add a ufw allow 22/tcp rule and assume your database container port is safe? Nothing breaks visibly — your ufw rules look correct, your firewall status shows the port blocked — but the port is reachable from anywhere. Docker's DNAT rule fires in the PREROUTING chain, before the packet ever reaches the INPUT chain that ufw manages. The firewall never sees it. This is the most common “how is my database exposed” production incident in Docker deployments.

The DOCKER-USER Chain

Docker processes its own firewall chains before any user-defined rules appended to the FORWARD chain. If you add iptables rules to FORWARD, they will be evaluated after Docker's rules and may never see certain packets. The correct place for custom firewall rules that need to interact with Docker traffic is the DOCKER-USER chain.

restricting access to a published port

# Only allow 10.0.0.0/8 to reach containers
# Insert the ACCEPT first, then append the DROP -- order matters
$ sudo iptables -I DOCKER-USER -i eth0 -s 10.0.0.0/8 -j ACCEPT
$ sudo iptables -A DOCKER-USER -i eth0 -j DROP

Be aware that by the time packets reach the DOCKER-USER chain, destination NAT has already occurred. This means you can only match on the container's internal IP address and port, not the host's published port. If you need to filter before DNAT, you must work in the raw or mangle tables instead.

Caution

Docker's iptables rules can bypass your host firewall. If you run ufw or firewalld and publish a port with -p, that port becomes reachable from the outside even if your firewall would normally block it. This happens because Docker's PREROUTING DNAT rule redirects the packet before the INPUT chain ever sees it. When using the iptables backend, always use DOCKER-USER for access control on published ports. Note that DOCKER-USER does not exist in Docker's nftables backend -- if you have switched to "firewall-backend": "nftables", you must use nftables base chains with appropriate priorities instead.

nftables Support

Docker also supports nftables as an alternative firewall backend, introduced as an experimental feature in Docker Engine 29.0.0. You can select between iptables and nftables using the firewall-backend option in daemon.json. However, there are important differences between the two backends. When using nftables, Docker creates rules directly in two dedicated tables -- ip docker-bridges and ip6 docker-bridges -- rather than inserting rules into the host's existing iptables chains. The DOCKER-USER chain does not exist in the nftables backend; custom filtering rules must instead be added in separate tables using nftables base chains with appropriate hook priorities. Additionally, nftables support is incompatible with Docker Swarm mode -- the overlay network rules required by Swarm have not yet been migrated from iptables. When running with the nftables backend, Docker also does not enable IP forwarding automatically; if forwarding is not already enabled on the host, the daemon will report an error at startup.

Host and None Networks

Not every container needs its own network namespace. Docker provides two special network modes that skip the bridge entirely.

The host driver removes network isolation between the container and the Docker host. The container shares the host's network namespace directly -- it sees the same interfaces, uses the same IP addresses, and binds to the same port space. There is no NAT, no bridge, and no veth pair. This mode is useful when a container needs maximum network performance or needs to bind to a large range of ports without the overhead of individual port mappings. The tradeoff is zero network isolation -- a process in the container can bind to any port on the host, and port conflicts between containers become your problem.

$ docker run --rm --network host nginx

The none driver gives the container its own network namespace but does not configure any interfaces beyond the loopback device. The container has no external connectivity at all. This is useful for batch jobs that process local data and should have no network access, or for containers where you want to configure networking manually after creation.

Overlay Networks

Bridge networks are confined to a single host. When containers on different Docker hosts need to communicate directly, the overlay driver creates a distributed network that spans multiple daemon instances.

Overlay networks use VXLAN (Virtual Extensible LAN) to encapsulate container frames inside UDP packets. Each overlay network gets a unique VXLAN Network Identifier (VNI) — a 24-bit field that allows up to 16,777,216 distinct overlay segments to coexist on the same physical infrastructure. This is why VXLAN was developed for multi-tenant cloud environments: traditional VLANs are limited to a 12-bit identifier space of only 4,096 segments. Docker assigns VNIs starting at 4096 for Swarm overlay networks, which you can verify with docker network inspect <network> | grep vxlan_id.

The Docker daemons on each host maintain a mapping of which container IPs and MAC addresses belong to which host using NetworkDB, Docker's own distributed in-memory database backed by the memberlist library (the gossip protocol successor to Serf) running over TCP/UDP port 7946. When a container starts on any Swarm node, its IP-to-MAC mapping is gossiped to all other nodes via NetworkDB. Each daemon then pre-populates its kernel ARP table and forwarding database (FDB) with this information. This means that when a container sends a packet to another container on a different host, the local VXLAN tunnel endpoint already knows the remote host's IP without needing to broadcast an ARP request — the daemon has already answered it. This gossip-driven pre-population is what makes Docker overlay networks function without requiring an external key-value store like Consul or etcd in Swarm mode. When a container on host A sends a packet to a container on host B, the local daemon's VXLAN tunnel endpoint wraps the frame in a UDP packet, sends it to host B's tunnel endpoint on UDP port 4789, and the receiving daemon decapsulates the frame and delivers it to the target container's namespace.

creating an overlay network in Swarm mode

# Initialize Swarm on the first node
$ docker swarm init --advertise-addr 192.168.1.10

# Create an overlay network
$ docker network create \
    --driver overlay \
    --subnet 10.30.0.0/24 \
    --attachable \
    backend-overlay

# Deploy a service across the overlay
$ docker service create \
    --name api \
    --network backend-overlay \
    --replicas 3 \
    myapp:latest

The --attachable flag allows standalone containers (not just Swarm services) to connect to the overlay network. Without it, only services deployed through docker service create can use the network.

Warning

Overlay networks add encapsulation overhead. The VXLAN encapsulation header consumes exactly 50 bytes per packet: 14 bytes for the outer Ethernet frame, 20 bytes for the outer IPv4 header, 8 bytes for the UDP header, and 8 bytes for the VXLAN header itself. This reduces the effective MTU available to container payloads. If you see fragmentation errors or unexplained performance degradation, lower the MTU on the overlay network with --opt com.docker.network.driver.mtu=1450 (or underlay MTU minus 50). For deeper packet-level traffic control and shaping between container networks, tc and traffic shaping on Linux covers the full toolset. You must also ensure that UDP port 4789 is open between all Swarm nodes.

Caution

Overlay network traffic is unencrypted by default. Application data flowing between containers on different Swarm nodes traverses the physical network as plaintext inside UDP packets. If your underlay network is not fully trusted — cloud multi-tenant environments, co-location facilities, or any path that crosses untrusted infrastructure — you must enable encryption explicitly at network creation time with --opt encrypted. This enables IPsec encryption at the VXLAN layer. There is a measurable performance cost, so benchmark in your environment before enabling it in production.

vxlan overlay mtu calculator

underlay MTU (bytes)

→

overlay container MTU

1450

vxlan overhead

50 bytes

--opt com.docker.network.driver.mtu=1450

Standard Ethernet underlay. Safe default for most bare-metal and VM environments.

Macvlan and IPvlan

Bridge and overlay networks introduce abstraction layers -- bridges, NAT, encapsulation -- between the container and the physical network. Sometimes you need containers to appear as real devices on the LAN, with their own addresses and no NAT in the path. That is what the macvlan and ipvlan drivers provide.

Macvlan

The macvlan driver assigns each container a unique MAC address and connects it directly to a parent interface on the host. From the perspective of the physical network, each container looks like a separate physical device plugged into the switch. There is no bridge, no NAT, and no port mapping. The container gets an IP address on the same subnet as the host's physical network.

creating a macvlan network

$ docker network create -d macvlan \
    --subnet=192.168.1.0/24 \
    --gateway=192.168.1.1 \
    -o parent=eth0 \
    lan-net

$ docker run --rm -it \
    --network lan-net \
    --ip 192.168.1.50 \
    alpine sh

Macvlan requires the host NIC to operate in promiscuous mode so it will accept frames destined for MAC addresses other than its own. On physical infrastructure, the upstream switch port must also allow multiple MAC addresses — most managed switches enforce per-port MAC address limits or port security policies that will silently drop the extra addresses. On VMware or KVM environments, enabling promiscuous mode on the virtual switch is the relevant setting. There is also a kernel restriction: by default, the Docker host itself cannot communicate directly with macvlan containers. If you need host-to-container communication, create a macvlan sub-interface on the host or use a bridge network alongside the macvlan. If you need to trunk multiple networks over a single interface, Docker supports 802.1q VLAN sub-interfaces as the parent — for example, -o parent=eth0.100 creates the macvlan network over VLAN 100.

IPvlan

IPvlan is similar to macvlan but solves a key limitation: it shares the parent interface's MAC address across all containers. Each container gets its own IP address, but they all use the same MAC. This avoids MAC address exhaustion on the physical switch and works in environments where the number of allowed MAC addresses per port is restricted.

IPvlan supports two modes. L2 mode behaves like macvlan from a networking perspective -- containers are on the same broadcast domain as the parent interface. L3 mode removes all broadcast and multicast traffic, routing packets between endpoints instead. L3 mode is well suited for large-scale deployments where broadcast storms and bridging loops are concerns, because it eliminates the bridging domain entirely.

creating an ipvlan L3 network

$ docker network create -d ipvlan \
    --subnet=10.50.0.0/24 \
    -o parent=eth0 \
    -o ipvlan_mode=l3 \
    routed-net

Note

Docker does not create iptables rules for macvlan or ipvlan networks. Firewalling is entirely your responsibility. If you need access control, configure iptables rules manually or use upstream firewall infrastructure.

Starting with Docker Engine 29.0.0, macvlan and ipvlan L2 networks will no longer configure a default gateway automatically unless a --gateway flag is explicitly included in the IPAM configuration (moby/moby#50929). Always specify --gateway explicitly when creating these networks. Prior to Engine 29, Docker inferred the first host address in the subnet as the gateway; that behavior is now gone.

docker network driver comparison — select a driver

scope

single host

NAT

yes (masquerade)

DNS between containers

yes (user-defined)

host firewall bypass risk

yes — use DOCKER-USER

The default driver. Docker creates a Linux bridge interface (docker0 for the default network, a randomly named bridge for user-defined networks) and connects each container via a veth pair. The daemon injects iptables rules for masquerading and port forwarding automatically.

use whenContainers on the same host need to communicate. The right choice for the vast majority of Docker Compose deployments. Always create user-defined bridges rather than relying on the default docker0.

scope

multi-host (Swarm)

encapsulation

VXLAN (50 byte overhead)

requires

Swarm or --attachable

MTU consideration

set explicitly always

Creates a distributed virtual network spanning multiple Docker daemons using VXLAN encapsulation. Each overlay network gets a unique VNI. The daemon on each host maintains a forwarding table mapping container IPs to host IPs and tunnels frames as UDP packets on port 4789.

use whenServices are distributed across multiple Docker hosts or Swarm nodes. Always set --opt com.docker.network.driver.mtu explicitly (underlay MTU − 50). Ensure UDP 4789 and TCP 7946 are open between nodes.

scope

single host (LAN-visible)

NAT

none — direct on LAN

unique MAC per container

yes

host ↔ container

needs macvlan sub-interface

Assigns each container a unique MAC address and connects it directly to a parent interface. Containers appear as separate physical devices on the switch. No bridge, no NAT. Requires the switch port to support promiscuous mode.

use whenContainers must be reachable on the physical LAN with their own IPs and MACs, or when you need to avoid NAT for latency or protocol reasons. If the host also needs to reach containers, create a macvlan sub-interface on the host.

scope

single host (LAN-visible)

shared MAC

yes — no MAC exhaustion

modes

L2 (bridge) / L3 (routed)

broadcast traffic

eliminated in L3 mode

Like macvlan but all containers share the parent interface's MAC address, each getting a unique IP. L2 mode behaves like macvlan. L3 mode routes packets between endpoints instead of bridging, eliminating broadcast domains entirely — suited for large-scale deployments.

use whenThe switch enforces per-port MAC limits (making macvlan impractical), or when you want L3 routing between container groups with no broadcast overhead. L3 mode requires upstream router awareness of container subnets.

isolation

none — shares host ns

NAT overhead

none

port conflicts

your responsibility

performance

maximum

The container shares the host's network namespace directly. No veth pair, no bridge, no NAT. The container sees and binds to the host's interfaces and port space.

use whenThe container needs maximum network performance or must bind to a large range of ports. The tradeoff is complete loss of network isolation — any process in the container can bind any host port. Never use in multi-tenant environments.

interfaces

loopback only

external connectivity

none

custom networking

fully manual

isolation

maximum

Gives the container its own network namespace with no interfaces beyond loopback. The container has zero external connectivity out of the box. Useful for batch jobs that should never touch the network, or as a starting point for custom namespace wiring.

use whenThe workload must have no network access (data processing jobs, security sandboxes), or when you intend to wire networking manually into the container's namespace after creation using ip link set ... netns.

Embedded DNS and Service Discovery

On user-defined networks, Docker runs an embedded DNS server at 127.0.0.11 inside each container. This server resolves container names, network aliases, and service names (in Swarm mode) to their corresponding IP addresses. The embedded DNS server forwards any queries it cannot resolve -- external hostnames -- to the DNS servers configured on the host, or to custom servers specified with --dns at container creation.

Under the Hood

The embedded DNS resolver does not actually listen on port 53 inside the container's namespace. Binding port 53 would conflict with any service the container itself tries to run on that port (BIND, dnsmasq, CoreDNS). Instead, dockerd binds the resolver to a random high-numbered port inside the container's network namespace and installs two iptables rules in that namespace — DOCKER_OUTPUT and DOCKER_POSTROUTING — that DNAT all outbound DNS traffic destined for 127.0.0.11:53 to the actual listener port, then SNAT the replies back to look like they came from port 53. This is why you cannot find the DNS server process with ps inside the container: the listener belongs to the dockerd process on the host, not a process inside the container. You can reveal the actual port with nsenter --net=$(docker inspect --format '{{.NetworkSettings.SandboxKey}}' <container>) ss -ulnp.

Containers on the default bridge network do not use the embedded DNS server. They receive a copy of the host's /etc/resolv.conf and have no ability to resolve other containers by name. This is one of the strongest reasons to always use user-defined bridge networks rather than the default. When the embedded DNS server cannot reach upstream resolvers, you will see failures that can look like general connectivity problems -- Docker's failure to query external DNS is a common symptom worth understanding separately.

What breaks if…

Your host runs Ubuntu 22.04+ with systemd-resolved and you do nothing special? Every container on a user-defined network can resolve other container names fine — but all external DNS queries silently fail. curl https://example.com hangs. Package managers time out. The container's /etc/resolv.conf points to 127.0.0.53, which is only reachable from the host's own namespace, not from inside Docker's embedded resolver. The failure is invisible until you need external connectivity.

The systemd-resolved Problem

On Ubuntu 22.04+, Debian 12+, and any distribution using systemd-resolved, there is a specific failure mode that catches many administrators off guard. By default, systemd-resolved sets the nameserver in /etc/resolv.conf to the loopback stub address 127.0.0.53. Docker reads /etc/resolv.conf when a container starts and uses whatever nameserver it finds there. But 127.0.0.53 is only reachable from within the host's own network namespace -- Docker's embedded DNS resolver runs in a separate namespace and cannot reach it. The result is that all external DNS queries silently fail inside the container.

There are two reliable fixes. The first is to point Docker at real upstream resolvers in daemon.json, bypassing /etc/resolv.conf entirely:

/etc/docker/daemon.json -- bypass systemd-resolved

{
  "dns": ["1.1.1.1", "8.8.8.8"]
}

The second fix -- preferred if you need containers to respect your host's full DNS configuration including split-horizon or search domains -- is to configure systemd-resolved to write a real upstream address to its non-stub resolver file, then symlink /etc/resolv.conf to that file:

fix /etc/resolv.conf to use the non-stub resolver

# Replace the stub-only symlink with the full upstream resolver file
$ sudo ln -sf /run/systemd/resolve/resolv.conf /etc/resolv.conf

# Verify: should show a real upstream IP, not 127.0.0.53
$ cat /etc/resolv.conf
nameserver 192.168.1.1
nameserver 8.8.8.8

After either fix, fully recreate any running containers -- they read /etc/resolv.conf at start time and retain that configuration for their lifetime. Restarting a container is not sufficient; it must be removed and re-created to pick up the corrected upstream.

verifying DNS resolution inside a container

$ docker exec worker cat /etc/resolv.conf
nameserver 127.0.0.11
options ndots:0

$ docker exec worker nslookup api
Server:    127.0.0.11
Address 1: 127.0.0.11

Name:      api
Address 1: 10.20.0.2 api.app-net

The options ndots:0 line in the container's resolv.conf is intentional. ndots controls how many dots a hostname must contain before the resolver tries it as an absolute name first rather than appending search domains. Docker sets it to 0, which means every query — including short container names like api — is attempted as-is before any search domain suffix is tried. This is what makes bare container name resolution work reliably and quickly without spurious multi-suffix lookup attempts.

In Swarm mode, the embedded DNS server also supports virtual IP (VIP) load balancing, which is the default endpoint mode for services. When a client inside the cluster resolves a service name, DNS returns a single stable virtual IP assigned to that service. The kernel's IPVS module then distributes connections across the service's healthy tasks transparently — the application connects to one IP and never sees individual container addresses. As an alternative, services can be created with --endpoint-mode dnsrr (DNS round-robin), in which case DNS returns the IP addresses of all individual tasks directly and the client performs its own balancing. VIP mode is preferred for the vast majority of use cases because it provides a stable address that survives task restarts and rescheduling.

Daemon Configuration for Networking

The Docker daemon's network behavior is controlled through /etc/docker/daemon.json and command-line flags passed to dockerd. Here are the networking-relevant options that matter in production.

/etc/docker/daemon.json -- networking options

{
  // Custom address pools to avoid conflicts
  "default-address-pools": [
    { "base": "10.200.0.0/16", "size": 24 }
  ],

  // Set the default bridge subnet explicitly
  "bip": "10.199.0.1/24",

  // Disable inter-container communication on the default bridge
  "icc": false,

  // Enable IPv6 on the default bridge
  "ipv6": true,
  "fixed-cidr-v6": "fd00:dead:beef::/48",

  // Use nftables instead of iptables (experimental, Docker 29+, not compatible with Swarm)
  "firewall-backend": "nftables",

  // Enable IP masquerading (default: true)
  "ip-masq": true,

  // Enable Docker's firewall rule creation (applies to both backends; default: true)
  "iptables": true,

  // Same control for IPv6 rules -- managed independently from iptables (default: true)
  "ip6tables": true,

  // Keep containers running during daemon restart
  "live-restore": true,

  // Custom DNS servers for containers
  "dns": ["8.8.8.8", "1.1.1.1"]
}

A few notes on these options. The bip setting controls only the default bridge (docker0). User-defined networks ignore it and draw from the default address pools instead. The icc flag (inter-container communication) defaults to true, which means all containers on the default bridge can freely communicate. Setting it to false forces containers on the default bridge to communicate only through published ports, adding a layer of isolation.

The iptables flag controls whether Docker manages firewall rules at all. Setting it to false will prevent Docker from creating NAT and filter rules, but doing so breaks outbound connectivity and port publishing for bridge networks unless you write equivalent rules yourself. This is almost never what you want in production.

Docker Engine 28 and 29: Firewall Rule Overhaul

Docker Engine 28 significantly refactored the iptables and ip6tables rules used to implement port publishing and network isolation. The Docker 28 release notes document this as a first step toward native nftables support. One practical consequence: if you downgrade from Docker 28 or 29 to an older daemon version without a reboot or manual rule flush, some of the new rule structures may conflict with what the older daemon expects. The cleanest path when downgrading is to reboot the host, or to run iptables -F and ip6tables -F before starting the older daemon.

Docker Engine 29 built on that refactoring to introduce the experimental --firewall-backend=nftables option. According to the Docker Engine v29 announcement (published November 11, 2025): "In a future release, nftables will become the default firewall backend and iptables support will be deprecated." For the current Docker 29 release, nftables remains opt-in and experimental. Additionally, when Docker Engine 29 is dynamically linked (rather than installed as a static binary), the daemon now requires libnftables as a runtime dependency (moby/moby#51033).

A second networking change in Engine 29 affects macvlan and ipvlan L2 networks: Docker will no longer configure a default gateway on these networks unless a --gateway is explicitly included in the IPAM configuration (moby/moby#50929). This was changed to address startup failures in networks with IPv6 auto-configuration enabled. If you relied on Docker inferring the gateway for macvlan or ipvlan networks, you must now specify it explicitly at network creation time.

The current production release as of April 2026 is Docker Engine 29.3.1. Security patches and bug fixes for the 29.x branch continue to be published on the official 29.x release notes page.

Debugging Docker Networking

networking troubleshooter — answer the questions to find the cause

What is the symptom?

Can the container reach the host's gateway IP?

Does cat /etc/resolv.conf inside the container show 127.0.0.53?

Which network are both containers on?

What address did you use in the -p flag?

Does sudo iptables -t nat -L DOCKER -n show a DNAT rule for the port?

Is the traffic being blocked by a DOCKER-USER rule?

What happened just before connectivity was lost?

Is UDP port 4789 and TCP port 7946 open between all Swarm nodes?

likely cause: IP forwarding disabled

Check with sysctl net.ipv4.ip_forward. If it returns 0, Docker's iptables rules for MASQUERADE are in place but the kernel is not forwarding packets. This can happen when a firewall service sets the FORWARD policy without re-enabling forwarding after Docker was stopped. Run sudo sysctl -w net.ipv4.ip_forward=1 and then sudo systemctl restart docker to restore the correct state.

cause: systemd-resolved stub at 127.0.0.53

Docker's embedded DNS resolver cannot reach 127.0.0.53 from inside a container namespace. Fix by either adding "dns": ["1.1.1.1", "8.8.8.8"] to /etc/docker/daemon.json, or symlinking /etc/resolv.conf to /run/systemd/resolve/resolv.conf. After either fix, fully recreate affected containers — they must be removed and re-started, not just restarted.

likely cause: missing MASQUERADE rule

Run sudo iptables -t nat -L POSTROUTING -n -v and look for a MASQUERADE rule covering the container's subnet. If it is missing, Docker's NAT rules were flushed. Restart the Docker daemon with sudo systemctl restart docker to re-inject them. Add "live-restore": true to daemon.json to avoid interrupting containers on future restarts.

cause: default bridge has no embedded DNS

The default docker0 bridge does not run Docker's embedded DNS server. Containers on it receive a copy of the host's /etc/resolv.conf and cannot resolve each other by name. Create a user-defined bridge network with docker network create --driver bridge app-net and move your containers to it. In Compose, declare a named network under the networks: key and assign it to each service.

cause: embedded DNS cannot reach upstream

Container name resolution works, but the embedded DNS server at 127.0.0.11 cannot forward external queries. Check /etc/resolv.conf inside the container for a loopback address (127.0.0.53 = systemd-resolved, 127.0.0.1 = local resolver). Fix by adding explicit upstream servers to daemon.json: "dns": ["1.1.1.1", "8.8.8.8"].

cause: containers are on isolated networks

Containers on different user-defined bridge networks are intentionally isolated and cannot resolve each other by name or communicate at all. Connect the container that needs access to the other network using docker network connect other-net container-name, or in Compose, add the service to both networks under its networks: key.

working as intended: port bound to localhost only

The 127.0.0.1:port:port binding restricts the port to the loopback interface. External hosts cannot reach it by design. If you need external access, either change the binding to -p port:port (all interfaces) or -p <host-ip>:port:port (a specific interface). If you want to keep it localhost-only and expose it to the network, place a reverse proxy (nginx, caddy) on the host in front of it.

cause: Docker's iptables rules were flushed

The DNAT rule should always be present for a running container with a published port. Its absence means Docker's chains were cleared after the daemon last wrote them -- typically by a firewall service reload. Restart Docker (sudo systemctl restart docker) to re-inject the rules. Then fix the root cause: add a systemd drop-in with After=nftables.service firewalld.service so Docker starts after the firewall on every boot.

cause: custom DOCKER-USER rule is blocking

A rule in DOCKER-USER is dropping traffic before it reaches Docker's own accept rules. Check with sudo iptables -L DOCKER-USER -n -v --line-numbers and identify which rule is matching. Remember that by this point DNAT has already occurred -- match on the container's internal IP, not the host's published port. Use sudo iptables -D DOCKER-USER <line-number> to remove a specific rule.

check: is the container's process actually listening?

The DNAT rule exists and no firewall rule is blocking, which means the port forwarding path is correct. The most common remaining cause is that the process inside the container is not listening on the expected port, or is bound to 127.0.0.1 inside the container rather than 0.0.0.0. Check with docker exec mycontainer ss -tlnp or docker exec mycontainer netstat -tlnp. If you need to identify which process on the host is holding a port, finding which process is using a port on Linux walks through the full toolkit.

cause: firewall reload flushed Docker's chains

When a firewall service reloads its ruleset atomically, Docker's iptables chains are wiped. Fix immediately: sudo systemctl restart docker re-injects the rules (with live-restore: true, containers keep running). Fix permanently: add a systemd drop-in at /etc/systemd/system/docker.service.d/firewall-ordering.conf with [Unit] / After=nftables.service firewalld.service so Docker's rules are written after each firewall reload. Reload systemd with sudo systemctl daemon-reload.

cause: daemon restarted without live-restore

Without "live-restore": true in daemon.json, restarting the Docker daemon stops all containers. Add "live-restore": true to /etc/docker/daemon.json so that containers survive daemon restarts. Then sudo systemctl restart docker to re-populate all iptables rules without stopping workloads.

likely cause: MTU mismatch causing silent fragmentation

VXLAN adds 50 bytes of overhead. If the overlay MTU is not set explicitly, Docker may auto-detect incorrectly, leading to packets being silently fragmented or dropped. Verify with docker run --rm --network <overlay-net> nicolaka/netshoot ping -M do -s 1400 -c 3 <peer-ip>. If you see Frag needed, lower the network's MTU. Recreate the overlay network with --opt com.docker.network.driver.mtu=<underlay-mtu-minus-50>.

cause: required Swarm ports are blocked

Overlay networks require UDP 4789 (VXLAN data) and TCP 7946 (Swarm control plane gossip) to be open between all nodes. Verify with nc -vzu <peer-ip> 4789 and nc -vz <peer-ip> 7946 from each node. Add rules on the host firewall or cloud security group to allow these ports between all Swarm members. Also confirm that the host's network does not block multicast, which Swarm uses for peer discovery.

When container networking fails, a systematic approach beats guessing. Work from the inside out: start in the container, then check the bridge, then check the host's routing and firewall rules.

debugging workflow

# 1. Check the container's network configuration
$ docker exec mycontainer ip addr show
$ docker exec mycontainer ip route show
$ docker exec mycontainer cat /etc/resolv.conf

# 2. Inspect the Docker network
$ docker network inspect app-net

# 3. Check the bridge and veth interfaces on the host
$ ip link show type bridge
$ bridge link show

# 4. Verify iptables rules
$ sudo iptables -t nat -L -n -v
$ sudo iptables -L DOCKER-USER -n -v

# 5. Check IP forwarding
$ sysctl net.ipv4.ip_forward

# 6. Test connectivity from inside a debug container
$ docker run --rm --network app-net nicolaka/netshoot \
    ping -c 3 api

The nicolaka/netshoot image is invaluable for network debugging. It ships with ping, dig, nslookup, traceroute, tcpdump, iperf3, curl, and dozens of other networking tools that are absent from slim production container images. If you need to capture traffic from a remote host for analysis in Wireshark, the technique of piping tcpdump output over SSH works cleanly alongside container debugging workflows.

Common failure patterns include: DNS resolution failures (check that the container is on a user-defined network, not the default bridge), subnet conflicts (compare docker network inspect output with ip route show on the host), missing iptables rules (happens when the daemon restarts but iptables was flushed externally), and VXLAN failures on overlay networks (verify that UDP port 4789 is open between Swarm nodes).

The best networking debug session is the one you never need. Define explicit subnets, use user-defined bridges, document your address allocation, and test connectivity as part of your deployment pipeline.

Wrapping Up

Docker's networking architecture is a composition of well-understood Linux primitives -- namespaces, veth pairs, bridges, iptables, VXLAN -- orchestrated by the daemon through the Container Network Model. The bridge driver handles single-host communication, overlays extend connectivity across hosts, and macvlan/ipvlan give containers direct access to the physical network without NAT.

The daemon manages all of this transparently for simple use cases, but production environments demand that you understand the layers involved. Subnet collisions, firewall rule conflicts, and MTU mismatches are all consequences of the daemon's networking decisions, and resolving them requires knowing what the daemon created, where it put the rules, and which configuration knobs control the behavior. Master the primitives, inspect the interfaces, read the iptables chains, and container networking stops being a black box.

How to Configure Docker Daemon Networking

Step 1: Configure the daemon address pools

Edit /etc/docker/daemon.json and define the default-address-pools array to control which CIDR ranges Docker uses when creating bridge networks automatically. Restart the Docker daemon with systemctl restart docker to apply the changes.

Step 2: Create an isolated user-defined bridge network

Run docker network create with the --driver bridge flag and explicit --subnet and --gateway options. Optionally pass --internal to block all outbound traffic from containers on the network.

Step 3: Inspect and verify the network configuration

Use docker network inspect to confirm the subnet, gateway, and driver settings. Verify the corresponding Linux bridge interface and iptables rules with ip link show and iptables -t nat -L to confirm that masquerading and port forwarding are in place.

Step 4: Add custom firewall rules to DOCKER-USER

Insert iptables rules into the DOCKER-USER chain to restrict or allow traffic to published container ports. Rules in this chain are processed before Dockers own forwarding rules, making it the correct place for custom filtering without interfering with Dockers internal chains.

Step 5: Migrate to the nftables backend on Docker Engine 29 or later

Set firewall-backend to nftables in /etc/docker/daemon.json to use Docker's experimental nftables support. Ensure IP forwarding is already enabled on the host before restarting the daemon, because the nftables backend will not enable it automatically. Migrate any custom rules from the iptables DOCKER-USER chain to nftables base chains with equivalent hook priorities. Do not enable the nftables backend if the host runs Docker in Swarm mode, as overlay network support for nftables is not yet available.

Frequently Asked Questions

Why does Docker manipulate iptables rules, and can I disable it?

Docker inserts iptables rules to implement NAT masquerading and port forwarding for bridge networks. Without these rules, containers cannot reach external hosts and published ports stop working. You can disable this behavior with the iptables: false option in daemon.json, but doing so will break outbound connectivity and port publishing for bridge-attached containers unless you write equivalent rules yourself.

What is the difference between the default bridge network and a user-defined bridge network?

The default bridge network (docker0) does not provide automatic DNS resolution between containers, so you must use IP addresses or legacy --link flags. User-defined bridge networks run an embedded DNS server at 127.0.0.11 that resolves container names and aliases automatically. User-defined bridges also offer better isolation because containers on different user-defined networks cannot communicate without explicit cross-connection.

When should I use an overlay network instead of a bridge network?

Use an overlay network when containers running on different Docker hosts need to communicate directly with each other. Overlay networks use VXLAN encapsulation to tunnel layer 2 frames across the underlay network, giving containers on separate hosts the appearance of sharing the same broadcast domain. Bridge networks are limited to a single Docker host. If all your containers run on the same machine, a bridge network is simpler and avoids the encapsulation overhead of VXLAN.

What changes when I switch Docker to the nftables firewall backend?

When you enable the nftables backend (available as an experimental feature since Docker Engine 29.0.0), Docker creates rules directly in two dedicated nftables tables called ip docker-bridges and ip6 docker-bridges instead of inserting rules into the host's iptables chains. The DOCKER-USER chain does not exist in the nftables backend; custom filtering rules must be added using nftables base chains with appropriate hook priorities. Additionally, Docker will not automatically enable IP forwarding on the host when running with the nftables backend. If IP forwarding is not already enabled, daemon startup or network creation will fail with an error. The nftables backend is also incompatible with Docker Swarm mode, because the overlay network rules required by Swarm have not yet been migrated from iptables.

Sources and Further Reading

The technical claims in this guide are grounded in official Docker and Moby project documentation, kernel documentation, and the libnetwork design specification. The following sources were consulted in the preparation of this guide and are provided for verification and further reading.

Docker Docs: Docker with nftables — Official documentation covering the experimental nftables backend introduced in Docker Engine 29, including table names (ip docker-bridges, ip6 docker-bridges), IP forwarding requirements, and Swarm incompatibility.
Docker Docs: Docker with iptables — Official reference for how Docker creates and manages iptables chains including DOCKER, DOCKER-USER, DOCKER-FORWARD, and DOCKER-INGRESS. Note: the DOCKER-ISOLATION-STAGE-1 and DOCKER-ISOLATION-STAGE-2 chains were removed in Docker Engine 29 (moby/moby#49981); their inter-network isolation logic was folded into the updated DOCKER-FORWARD chain.
Moby Project: libnetwork CNM Design Document — The canonical specification for the Container Network Model, defining the Sandbox, Endpoint, and Network abstractions that all Docker network drivers implement.
Docker Docs: Overlay Network Driver — Reference for overlay network creation, VXLAN configuration, encrypted overlay networks using IPsec ESP, and port requirements (UDP 4789 for VXLAN data, TCP/UDP 7946 for gossip).
Docker Docs: Manage Swarm Service Networks — Covers IPVS-based VIP load balancing, the ingress overlay network, docker_gwbridge, and DNS round-robin endpoint mode.
Docker Engine v29 Release Notes — Official changelog for Docker Engine 29, documenting nftables experimental support, containerd image store as the new default, and networking behavior changes.
Docker Engine v28 Release Notes — Documents the extensive iptables rule refactoring in Docker 28 that laid the groundwork for native nftables support in Docker 29.
Docker Blog: Docker Engine v29 — Foundational Updates for the Future (published November 11, 2025) — Explains the reasoning behind the nftables migration and states that nftables will become the default firewall backend in a future release, at which point iptables support will be deprecated.
Docker Docs: Networking Overview — The canonical reference for Docker's network drivers, default address pool configuration, and IPv6 subnet allocation behavior. Also documents that support for unspecified addresses in --subnet was introduced in Docker 29.0.0.
Docker Engine v29 Release Notes (29.0.0 through 29.3.1) — The current production release branch as of April 2026 is Docker Engine 29.3.1. This page covers all patch releases in the 29.x branch, including moby/moby#50929 (macvlan and ipvlan L2 networks no longer configure a default gateway unless explicitly specified), moby/moby#51515 (DNS resolution fix for non-Swarm-scoped networks after joining Swarm), and multiple security CVE fixes including CVE-2026-34040, CVE-2026-33997, and CVE-2026-33747.

^ back to top