Docker networking is one of those topics where most documentation tells you what the commands are, but not why packets go where they go, or why they sometimes refuse to go anywhere at all. Engineers who understand Docker at a surface level can spin up containers and expose ports -- but when service A cannot reach service B in a Compose stack, or when DNS resolution mysteriously fails in production Swarm, the guesswork begins. This article ends that guesswork.
We will cover the Linux primitives Docker builds on, what each network driver actually does under the hood, how the embedded DNS resolver works and when it breaks, and a complete troubleshooting playbook with real commands you can run right now.
The Linux Primitives Docker Builds On
Docker does not implement networking from scratch. It assembles existing Linux kernel features into a coherent abstraction. Understanding those features is the foundation for understanding everything Docker networking does.
Network namespaces are the core primitive. Network namespace support was introduced in Linux kernel 2.6.24 (2008). According to Docker's own security documentation, "namespace code has been exercised and scrutinized on a large number of production systems" since the 2.6.26 release in July 2008 -- the point at which the implementation was considered production-ready. The interface is CLONE_NEWNET, passed to the clone() or unshare() system calls. Each network namespace has its own independent view of network interfaces, routing tables, firewall rules, ARP tables, and socket state. When Docker creates a container, it creates a new network namespace for it. Inside that namespace, the only interfaces that exist are the ones Docker explicitly connects.
Virtual Ethernet pairs (veth pairs) are how namespaces communicate. A veth pair is exactly what it sounds like: two virtual network interfaces that are permanently connected to each other. Anything written into one end comes out the other, regardless of which namespace each end lives in. Docker creates a veth pair for each container, places one end inside the container's namespace (where it appears as eth0) and the other end in the root namespace, then attaches that root-namespace end to a bridge.
Linux bridges operate like a virtual L2 switch. The Docker bridge driver creates a bridge interface (default: docker0) and plugs the host-side veth ends into it. Frames arriving on any port get forwarded to other ports according to MAC address tables, just like a physical switch.
iptables and Netfilter handle NAT and port exposure. When you publish a port with -p 8080:80, Docker writes iptables rules in the DOCKER chain to DNAT incoming traffic on host port 8080 to the container's IP on port 80. The MASQUERADE rule handles outbound NAT so containers can reach the internet through the host's IP.
The iptables chain set Docker creates has evolved significantly across versions. As of Docker Engine 28.0 (released February 2025) and 28.0.1+, Docker creates the following chains in the filter table: DOCKER, DOCKER-USER, DOCKER-FORWARD, DOCKER-ISOLATION-STAGE-1, DOCKER-ISOLATION-STAGE-2, and DOCKER-INGRESS. The DOCKER-FORWARD chain was introduced in Docker Engine 28.0.1 via moby/moby#49518, which moved most of Docker's rules out of the filter-FORWARD chain to prevent Docker's rule appends from disrupting other applications. The FORWARD chain now unconditionally jumps to DOCKER-USER, DOCKER-FORWARD, and DOCKER-INGRESS. The two isolation-stage chains work together to block traffic between separate user-defined bridge networks.
On pre-28 Docker installations, DOCKER-FORWARD did not exist. If you are auditing a host and do not see it, check your Docker Engine version before concluding it is absent.
The userland proxy (docker-proxy) is a detail most documentation omits. By default, Docker runs a docker-proxy process for each published port. This process handles one specific edge case that iptables alone cannot: connections from the host itself to a published port via the loopback interface (127.0.0.1). When "userland-proxy": false is set in daemon.json, Docker switches to a pure iptables approach and enables net.ipv4.conf.<bridge>.route_localnet=1 to handle loopback routing. The default is true. Each proxy process is visible as /usr/bin/docker-proxy -proto tcp -host-ip ... -host-port ... -container-ip ... -container-port ... in ps aux, one per published port.
"Docker uses Linux bridge networks, iptables rules, and virtual Ethernet devices to provide container networking. The networking subsystem of Docker is pluggable, using drivers." -- Docker official documentation, docs.docker.com/engine/network/
This is the foundation. Now let us look at what each driver builds on top of it.
Bridge Networks: The Default and Its Nuances
The bridge driver is what you get when you run a container without specifying a network. There are two distinct cases here that behave very differently, and conflating them is a common source of confusion.
The Default Bridge (docker0)
When Docker installs, it creates a bridge called docker0 on the host, typically with the subnet 172.17.0.0/16. Every container started without a --network flag joins this bridge and receives an IP from that range.
The critical limitation of the default bridge: containers cannot resolve each other by name. The embedded DNS resolver does not serve the default bridge. If container A wants to talk to container B, it must know container B's IP address directly, or you must use the legacy --link flag (which is deprecated and works via /etc/hosts injection, not DNS).
Never rely on the default docker0 bridge for multi-container applications. The absence of automatic DNS, combined with dynamically assigned IPs that change on container restart, makes it fragile in practice. Always create user-defined bridge networks for anything beyond a single throwaway container.
User-Defined Bridge Networks
When you create a network with docker network create, Docker creates a new bridge (not docker0) and, critically, enables the embedded DNS resolver for that network. Containers on this network can resolve each other by container name or by network alias. This is what makes Docker Compose work.
# Create a user-defined bridge network $ docker network create --driver bridge --subnet 192.168.100.0/24 myapp-net # Inspect to verify bridge name and subnet $ docker network inspect myapp-net # Run two containers on this network $ docker run -d --name api --network myapp-net myimage:latest $ docker run -d --name db --network myapp-net postgres:16 # From 'api', db is reachable by name $ docker exec api ping db PING db (192.168.100.3) 56(84) bytes of data. 64 bytes from db.myapp-net (192.168.100.3): icmp_seq=1 ttl=64 time=0.089 ms
User-defined bridges also provide network-level isolation. Containers on myapp-net cannot communicate with containers on a different user-defined bridge unless you explicitly connect them with docker network connect. This is a meaningful security boundary: a compromised frontend container cannot directly reach a database container on a separate network.
What the Bridge Looks Like From the Host
You can inspect the actual Linux infrastructure Docker creates by looking at the host's network interfaces and bridge state:
# List all bridges on the host $ ip link show type bridge 4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue link/ether 02:42:c3:4a:8b:1f brd ff:ff:ff:ff:ff:ff 7: br-a1f2e3b4c5d6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue link/ether 02:42:9a:3c:7f:2a brd ff:ff:ff:ff:ff:ff # Show veth pairs -- each container has one end $ ip link show type veth # Show which veths are plugged into a specific bridge $ bridge link show br-a1f2e3b4c5d6 # See the iptables rules Docker wrote $ sudo iptables -t nat -L DOCKER --line-numbers -n -v
The bridge name Docker assigns is the network ID prefix (e.g., br-a1f2e3b4c5d6). Running bridge link show on it will enumerate the host-side veth ends of every container attached to that network.
Host Network Mode: When You Want Zero Overhead
The host driver is the simplest: the container shares the host's network namespace entirely. No veth pair, no bridge, no NAT. The container's processes bind directly to the host's network interfaces as if they were running outside Docker.
With host networking, nginx binds port 80 on the actual host. You do not need -p 80:80 -- in fact, port publishing flags are ignored in host mode because the concept does not apply. There is no NAT layer to traverse, so performance is essentially native. This makes host networking attractive for latency-sensitive workloads, monitoring agents that need visibility into all host ports, or containers that run eBPF programs or packet capture tools.
Host network mode eliminates all network isolation between the container and the host. A container with host networking and a privilege escalation vulnerability has a direct path to every port and interface on the machine. Use it deliberately and only when the performance or visibility requirements genuinely justify it. It should never be the default for web-facing application containers.
Host networking also means port conflicts become your problem. If two containers both try to bind port 8080 with host networking, the second one will fail to start. On a bridge network, each container gets its own IP and can each independently bind port 8080 -- the host only sees the mapped port you explicitly publish.
Overlay Networks: Multi-Host Routing in Swarm
Bridge networks are scoped to a single host. When you need containers on different machines to communicate as if they were on the same L2 segment, you need an overlay network. Docker's overlay driver implements VXLAN (Virtual Extensible LAN), a tunneling protocol standardized in RFC 7348 that encapsulates L2 Ethernet frames inside UDP packets.
The mechanics work like this: when container A on host-1 sends a frame to container B on host-2, the Docker overlay driver intercepts it at the VXLAN tunnel endpoint (VTEP), wraps it in a UDP packet destined for host-2's physical IP on port 4789, and sends it across the underlying network. Host-2's VTEP receives the UDP packet, strips the encapsulation, and delivers the inner Ethernet frame into container B's namespace.
Overlay networks require a key-value store for coordination. In Docker Swarm mode, this is handled automatically by the Raft consensus log built into Swarm manager nodes. Outside of Swarm, overlay networks require an external store like etcd or Consul. This is one reason Swarm (or Kubernetes, which uses its own CNI plugins for similar functionality) is the practical choice for multi-host Docker deployments.
# Initialize Swarm on manager node $ docker swarm init --advertise-addr 10.0.0.1 # Create an attachable overlay network # 'attachable' lets standalone containers join, not just services $ docker network create \ --driver overlay \ --attachable \ --subnet 10.20.0.0/16 \ prod-overlay # Deploy a service on this overlay $ docker service create \ --name api \ --network prod-overlay \ --replicas 3 \ myapi:latest # Verify network membership $ docker network inspect prod-overlay --format '{{json .Containers}}'
A detail that frequently surprises engineers: overlay networks use a separate subnet for the ingress mesh routing. When you publish a port on a Swarm service, traffic arrives on any Swarm node (even ones without a task running) and gets routed to a healthy replica. According to Docker's official Swarm networking documentation, this routing is handled by IPVS (IP Virtual Server), a load balancing module built into the Linux kernel. You can inspect IPVS state with sudo ipvsadm -Ln inside the ingress sandbox namespace. This mesh is a separate internal overlay network called ingress, distinct from your application overlay. You can inspect it with docker network inspect ingress.
What docker network inspect ingress reveals that docker ps does not: the ingress network contains a hidden network namespace called ingress-sbox (also listed as gateway_ingress-sbox in the containers section). This is not a container -- it is a pure network namespace that hosts the IPVS load balancing logic and the iptables rules that route published-port traffic into the overlay. When Swarm published-port routing breaks inexplicably, this is the namespace to inspect. You can enter it with:
# Enter the ingress_sbox namespace to inspect IPVS and iptables $ docker run -it --rm \ -v /var/run/docker/netns:/var/run/docker/netns \ --privileged \ nicolaka/netshoot \ nsenter --net=/var/run/docker/netns/ingress_sbox sh # Inside: inspect IPVS load balancing state $ ipvsadm -Ln # Inside: inspect the mangle rules that mark traffic for IPVS $ iptables -t mangle -L -n -v
The Swarm routing mesh uses SNAT to masquerade the source IP of ingress traffic before forwarding it to a service replica. This means your application containers see the ingress network's IP as the source, not the original client IP. If your application requires real client IPs (for logging, rate limiting, or geolocation), you must either use mode=host port publishing (which disables the mesh and binds directly to the node) or place a reverse proxy in front that captures and forwards the X-Forwarded-For header before the SNAT occurs.
VXLAN Firewall Requirements
Overlay networks fail silently when UDP port 4789 is blocked between hosts. Before assuming a Docker bug, always verify:
# On host-1, send a test UDP packet to host-2 port 4789 $ nc -zu 10.0.0.2 4789 && echo "4789 reachable" || echo "4789 BLOCKED" # Also verify TCP 2377 (Swarm management) and TCP/UDP 7946 (gossip) $ nc -z 10.0.0.2 2377 && echo "2377 ok" $ nc -z 10.0.0.2 7946 && echo "7946 tcp ok" $ nc -zu 10.0.0.2 7946 && echo "7946 udp ok"
How Docker DNS Actually Works
This section is the one that trips up many engineers, so it gets the most space. Container DNS is not magic -- it is a specific, inspectable piece of software running inside the Docker daemon.
The Embedded DNS Resolver
When a container starts on a user-defined network, Docker configures its /etc/resolv.conf to point to 127.0.0.11. This is a loopback address that routes to Docker's embedded DNS server, which runs as part of the Docker daemon process (not as a separate container). The details of how this works are more interesting than most documentation explains, and understanding them directly explains several common puzzles.
The DNS server does not actually listen on port 53. Instead, it binds to a randomly-chosen high-numbered port on 127.0.0.11. Docker then injects four iptables rules into the container's own network namespace to make this work transparently:
DOCKER_OUTPUTchain: DNAT rules that redirect TCP and UDP traffic addressed to127.0.0.11:53to the actual listening port (e.g.,127.0.0.11:<random_port>)DOCKER_POSTROUTINGchain: SNAT rules that rewrite the source port on replies back to:53so the querying application sees a normal DNS response
You can observe this directly by entering the container's namespace and inspecting its iptables rules:
# From inside the container, iptables shows the DNAT mechanism # (requires a privileged container or nsenter from the host) $ iptables-save -t nat *nat :PREROUTING ACCEPT :INPUT ACCEPT :OUTPUT ACCEPT :POSTROUTING ACCEPT :DOCKER_OUTPUT - [0:0] :DOCKER_POSTROUTING - [0:0] -A OUTPUT -d 127.0.0.11/32 -j DOCKER_OUTPUT -A POSTROUTING -d 127.0.0.11/32 -j DOCKER_POSTROUTING -A DOCKER_OUTPUT -d 127.0.0.11/32 -p tcp --dport 53 -j DNAT --to-destination 127.0.0.11:38159 -A DOCKER_OUTPUT -d 127.0.0.11/32 -p udp --dport 53 -j DNAT --to-destination 127.0.0.11:57554 -A DOCKER_POSTROUTING -s 127.0.0.11/32 -p tcp --sport 38159 -j SNAT --to-source :53 -A DOCKER_POSTROUTING -s 127.0.0.11/32 -p udp --sport 57554 -j SNAT --to-source :53 # The DNS daemon listens on the random port, not on 53 # From the host, nsenter lets you observe this in the container's namespace $ PID=$(docker inspect api --format '{{.State.Pid}}') $ nsenter -t $PID -n ss -ulnp # Shows dockerd listening on 127.0.0.11:57554 (UDP) and 127.0.0.11:38159 (TCP) # Nothing is on port 53 -- that is why ss inside a container shows nothing there
This design is deliberate. By avoiding port 53, Docker's DNS server does not conflict with any DNS server the container itself might want to run. It also explains why running a DNS server on port 53 inside a container works fine -- there is nothing already using port 53 from inside the namespace. The DNAT happens at the kernel level, before the socket layer, making it completely transparent to the application.
"Docker containers on user-defined networks can resolve container names to their IP addresses using Docker's embedded DNS server." -- Docker networking documentation, docs.docker.com/engine/network/drivers/bridge/
# What resolv.conf looks like on a user-defined network $ docker exec api cat /etc/resolv.conf nameserver 127.0.0.11 options ndots:0 # On the default bridge, it falls back to host's DNS $ docker run --rm alpine cat /etc/resolv.conf nameserver 8.8.8.8 nameserver 8.8.4.4
The DNS server at 127.0.0.11 handles two classes of queries. For names it knows about -- container names, service names, and any network aliases -- it answers directly from its internal registry. For anything else (external hostnames like api.stripe.com), it forwards the query upstream to the host's configured DNS servers (read from /etc/resolv.conf on the host at container start time).
DNS Resolution Order and the ndots Problem
The ndots:0 option in Docker's generated resolv.conf has practical implications. The ndots value tells the resolver how many dots a name must contain before it tries resolving it as an absolute name first (rather than appending search domains). With ndots:0, every query is treated as absolute first -- this reduces DNS lookup latency because the resolver does not prepend search domains before trying the bare name. Note that the exact value can vary depending on your Docker version and network configuration; always verify with docker exec <container> cat /etc/resolv.conf in your environment rather than relying on a default assumption.
However, if your application explicitly sets its own resolv.conf or uses a custom DNS client library that overrides these settings, behavior can diverge unexpectedly. JVM-based applications in particular cache DNS results aggressively by default -- a container IP change will not be visible to a Java service for the duration of the JVM's DNS TTL (often 30 seconds to infinity, depending on the security manager configuration).
For JVM applications, set -Dsun.net.inetaddr.ttl=10 (or 0 to disable caching entirely, though this has latency costs) to avoid stale DNS caches after container restarts. Better still, use stable service names rather than direct container names, so that service mesh or load balancer logic absorbs the IP churn.
Network Aliases: One Name, Many IPs
Beyond container names, Docker supports network aliases -- arbitrary DNS names you assign to a container on a specific network. Multiple containers can share the same alias, in which case Docker returns all their IPs in round-robin order. This is a primitive form of load balancing.
# Two containers sharing the alias 'backend' $ docker run -d --name backend1 --network myapp-net --network-alias backend myimage $ docker run -d --name backend2 --network myapp-net --network-alias backend myimage # DNS query for 'backend' returns both IPs $ docker exec frontend nslookup backend Server: 127.0.0.11 Address 1: 127.0.0.11 Name: backend Address 1: 192.168.100.4 backend1.myapp-net Address 2: 192.168.100.5 backend2.myapp-net
In Docker Compose, service names automatically become network aliases. All replicas of a service respond to the service name, and Docker's VIP (Virtual IP) load balancing handles distribution. In Swarm mode, this VIP is a stable virtual IP assigned to the service -- it persists even as task replicas restart and get new IPs. The VIP approach means DNS returns a single stable IP rather than a list of task IPs, avoiding stale-cache problems with aggressive DNS TTL clients.
Docker Compose Networking in Detail
Compose creates a user-defined bridge network for each project by default. The network name is <project-name>_default. Every service in the Compose file joins this network unless you specify otherwise. Services are reachable by their service name.
services: api: image: myapi:latest networks: - frontend - backend db: image: postgres:16 networks: - backend nginx: image: nginx:alpine ports: - "80:80" networks: - frontend networks: frontend: backend:
This topology creates two isolated networks. The api service has a leg in both, acting as the only path between the internet-facing nginx and the db. The database is not reachable from nginx -- they share no network. This is real defense in depth, not just documentation theater. Verify it yourself:
That ping will fail with "Name or service not known" because nginx is not on the backend network and Docker's DNS resolver will not return an address for db to a container that has no route to it.
None and Macvlan: The Edge Cases
Two additional drivers round out the picture. The none driver creates a container with only a loopback interface -- no network connectivity at all. It is useful for batch processing jobs that should never make outbound network calls, or as a security measure for containers that handle sensitive data in isolation.
The macvlan driver is the most exotic. It creates a virtual interface with its own MAC address that appears as a separate physical device on your L2 network. The container gets an IP from your physical LAN's subnet, receives broadcast traffic, and responds to ARP like a real machine. Macvlan is used when containers need to appear as first-class network citizens -- for legacy applications that require a specific IP that cannot change, or for traffic inspection scenarios where the container must receive multicast or broadcast that would otherwise be blocked.
Macvlan containers on the same physical host cannot communicate with each other by default due to how the Linux macvlan driver works -- the parent interface does not reflect traffic between sub-interfaces. This is a Linux kernel behavior, not a Docker limitation. To work around it, create a separate macvlan interface on the host that connects to the same parent interface and can bridge traffic, or use ipvlan in L2 mode as an alternative driver that does not have this restriction. Note that the Docker network --driver macvlan already uses "bridge" as its default subtype in the macvlan sense -- the issue is the Linux kernel behavior on the parent interface, not the macvlan subtype option. This is a common point of failure when migrating legacy applications to macvlan.
The Troubleshooting Playbook
Connectivity failures between containers fall into a handful of distinct categories. Working through them systematically is far faster than random guessing.
Step 1: Verify Both Containers Are on the Same Network
# List all networks a container is connected to $ docker inspect api --format '{{json .NetworkSettings.Networks}}' | python3 -m json.tool # Or list containers on a specific network $ docker network inspect myapp-net --format '{{range .Containers}}{{.Name}} {{.IPv4Address}}{{"\n"}}{{end}}'
Step 2: Test Raw IP Connectivity
Before blaming DNS, test with a raw IP. If IP works but the name does not, you have a DNS problem. If neither works, you have a routing or firewall problem.
# Get the target container's IP $ docker inspect db --format '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' 192.168.100.3 # Test raw IP from source container $ docker exec api ping -c 3 192.168.100.3 # Test specific port with nc (install busybox-extras or use a debug image) $ docker exec api nc -zv 192.168.100.3 5432 Connection to 192.168.100.3 5432 port [tcp/postgresql] succeeded!
Step 3: Diagnose DNS Failures
# Confirm which DNS server the container is using $ docker exec api cat /etc/resolv.conf # Query Docker's embedded resolver directly $ docker exec api nslookup db 127.0.0.11 # If nslookup is not available, use a debug sidecar $ docker run --rm --network myapp-net nicolaka/netshoot nslookup db # Check if external DNS works (rules out upstream forwarding failure) $ docker exec api nslookup google.com
The nicolaka/netshoot image is worth knowing. It is a purpose-built network troubleshooting container that includes tcpdump, dig, nmap, iperf3, ss, conntrack, and dozens of other tools. You can attach it to any network and run diagnostics without modifying your application containers.
Step 4: Check iptables Rules
If IP connectivity fails even with correct network membership, the problem is often in iptables. Docker writes rules that allow inter-container traffic within a bridge, but a restrictive host firewall can interfere.
# Show Docker's forward rules $ sudo iptables -L DOCKER-USER -n -v $ sudo iptables -L FORWARD -n -v # Show NAT rules (published ports) $ sudo iptables -t nat -L DOCKER -n -v # If iptables shows nothing but Docker is running, check if nftables is active # Docker has an nftables backend (set via 'firewall-backend' in daemon.json) # On newer Ubuntu/Debian, iptables may be a shim over nftables $ lsmod | grep ip_tables $ sudo modprobe ip_tables # Verify which backend Docker is using $ sudo iptables --version # "nf_tables" in version string = nftables shim; "legacy" = classic iptables
Never manually flush the DOCKER, DOCKER-FORWARD, DOCKER-ISOLATION-STAGE-1, or DOCKER-ISOLATION-STAGE-2 chains with iptables -F. Docker manages these chains and will not automatically re-add rules until the next container start or daemon restart. If you need custom firewall rules that coexist with Docker, use the DOCKER-USER chain -- Docker explicitly does not modify it, and on Docker Engine 28+ it is evaluated before the DOCKER-FORWARD chain on every packet. Be aware that the chain set changed in Docker Engine 28.0 and 28.0.1; if you maintain scripts that enumerate or test for specific Docker chains, verify them against your running Engine version with docker version.
Step 5: Capture Traffic to Confirm What Is Actually Happening
When the above steps produce no clear answer, packet capture on the host-side veth interface gives you ground truth. Every packet the container sends is visible on the host.
# Find the veth interface for a container # Get the container's iflink value -- this IS the host-side veth's ifindex # The N+1 shortcut is unreliable; use iflink instead $ docker exec api cat /sys/class/net/eth0/iflink 13 # Find the host interface with that ifindex $ ip link | grep -A1 "^13:" 13: veth3a7b2c1@if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 ... # Capture on the bridge interface to see all container traffic $ sudo tcpdump -i br-a1f2e3b4c5d6 -n port 5432 # Or capture DNS specifically $ sudo tcpdump -i br-a1f2e3b4c5d6 -n port 53
Common Failure Scenarios and Their Causes
Service name resolves, but connection is refused: The container is running but the application inside has not started yet, is bound to 127.0.0.1 instead of 0.0.0.0, or the port in your Compose file does not match what the application actually listens on. Check with docker exec <container> ss -tlnp.
"Name or service not known" for service name: The source container is on the default bridge (no embedded DNS) or the target container is on a different user-defined network. Use docker network inspect to confirm both containers appear in the same network's container list.
Works locally, fails in CI/CD: CI runners often have restrictive inter-container routing. Check whether your CI environment uses Docker-in-Docker (DinD) -- DinD creates its own Docker daemon with its own network stack and its containers cannot directly reach the outer daemon's network. Use the service container pattern with explicit network configuration instead.
Overlay network intermittent failures in Swarm: Almost always UDP 4789 being dropped by a stateful firewall that is not tracking VXLAN sessions properly, or MTU mismatch. VXLAN adds overhead to every frame. For IPv4 underlays on a standard 1500-byte MTU network, the overhead is 50 bytes (14B outer Ethernet + 20B outer IP + 8B UDP + 8B VXLAN header), leaving 1450 bytes for the inner frame. If your underlay uses 802.1Q VLAN tagging, add 4 more bytes (54B total). For IPv6 underlays, the outer IP header is 40 bytes instead of 20, bringing total overhead to 70 bytes. Cloud environments (AWS, Azure, GCP) often use jumbo frames or different underlay MTUs, so always verify with ip link show on the host before setting overlay MTU. The safe formula: set overlay MTU to (physical underlay MTU - VXLAN overhead for your environment). Set this with --opt com.docker.network.driver.mtu=<value> when creating the overlay network.
# Create overlay with correct MTU for environments with 1500 byte physical MTU $ docker network create \ --driver overlay \ --opt com.docker.network.driver.mtu=1450 \ prod-overlay # Verify MTU inside a container on this network $ docker exec myservice ip link show eth0 eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue
Production Patterns Worth Knowing
A few practices separate production Docker networking from development experimentation.
Always name your networks explicitly. Auto-generated names like myproject_default make infrastructure-as-code declarations fragile and harder to reference in firewall rules or monitoring labels. Declare network names explicitly in Compose files.
Use external networks for cross-stack communication. When two Compose stacks need to communicate (e.g., a shared database and multiple application stacks), create the shared network externally once and reference it in each Compose file with external: true. This decouples the stacks' lifecycles from the shared network's lifecycle.
# Create the shared network once (not managed by any Compose file) $ docker network create --driver bridge shared-services # In stack-a/docker-compose.yml networks: shared-services: external: true # In stack-b/docker-compose.yml -- same reference networks: shared-services: external: true
Disable ICC (Inter-Container Communication) on the default bridge if you are not using it. Setting "icc": false in Docker daemon configuration blocks all direct inter-container communication on the default docker0 bridge and forces explicit port mapping for any connectivity. This is a hardening measure that prevents lateral movement within a compromised container fleet.
An important precision point: the icc daemon flag only applies to the default bridge (docker0), not to user-defined networks. According to the Docker CIS Benchmark documentation, containers on user-defined networks have isolation controlled separately via the DOCKER-ISOLATION-STAGE-1 and DOCKER-ISOLATION-STAGE-2 iptables chains regardless of the icc setting. If you want to restrict communication on a user-defined network, set the com.docker.network.bridge.enable_icc=false option when creating that specific network, rather than expecting the global daemon flag to apply.
{
"icc": false,
"iptables": true,
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
}
}
Understand userland-proxy and its trade-offs. By default, Docker runs a docker-proxy process for every published port ("userland-proxy": true in daemon.json). Each proxy is a userspace process -- visible as /usr/bin/docker-proxy in ps aux -- that handles the edge case of loopback connections from the host to a published port. Setting "userland-proxy": false eliminates these processes and uses pure iptables with net.ipv4.conf.<bridge>.route_localnet=1 instead. The tradeoff: fewer processes and modestly better throughput, at the cost of that kernel sysctl being set on your bridge interfaces. Both modes work correctly for the vast majority of workloads. If you have many published ports (dozens or more), disabling userland-proxy reduces process count and file descriptor usage noticeably.
Docker Engine 28 and 29: Networking Has Changed
If you are running Docker Engine 28 or later -- which as of early 2026 covers most recent installations -- the iptables landscape looks materially different from what older tutorials describe. Understanding these changes is essential for accurate troubleshooting and firewall management.
Docker Engine 28: Hardened Defaults and Restructured Chains
Docker Engine 28.0, released February 19, 2025, introduced the most significant restructuring of Docker's iptables rules since the project's early days. According to the Docker team's official release blog, the goals were threefold: harden security defaults, fix a long-standing vulnerability where unpublished container ports were reachable from other hosts on the same LAN, and restructure chains so that other applications using iptables were not disrupted by Docker's rule management.
"In Docker 28.0, we now explicitly drop unsolicited inbound traffic to each container's internal IP unless that port was explicitly published (-p or --publish). This doesn't affect local connections from the Docker host itself, but it does block remote LAN connections to unpublished ports." -- Docker Engineering Blog, February 28, 2025, docker.com/blog/docker-engine-28-hardening-container-networking-by-default/
The practical security consequence: on Docker Engine 28+, containers with unpublished ports are no longer reachable from remote hosts on the same physical network, even if net.ipv4.ip_forward is enabled and the host's FORWARD policy is ACCEPT. This closes a known vulnerability that had existed since Docker's initial design. If your infrastructure deliberately relied on direct LAN routing to container IPs without port publishing -- a pattern occasionally used in bare-metal clusters -- you must either publish the ports or create the network with gateway_mode_ipv4=nat-unprotected.
The initial 28.0.0 release caused widespread breakage due to dependency on the ip_set kernel modules, which are not available on all distributions. Docker 28.0.1 resolved this by replacing those rules and introducing the DOCKER-FORWARD chain via moby/moby#49518, which moved most bridge driver rules out of the FORWARD chain so that other applications appending rules to FORWARD would not have their execution order disrupted. The current chain structure on Docker Engine 28.0.1+ in the filter table is:
# Current chain structure in the filter table (Engine 28.0.1+) # The FORWARD chain jumps unconditionally to these three: Chain FORWARD (policy DROP) DOCKER-USER all -- 0.0.0.0/0 0.0.0.0/0 DOCKER-FORWARD all -- 0.0.0.0/0 0.0.0.0/0 DOCKER-INGRESS all -- 0.0.0.0/0 0.0.0.0/0 # DOCKER-FORWARD contains the per-bridge rules previously in FORWARD: Chain DOCKER-FORWARD (1 references) DOCKER-ISOLATION-STAGE-1 all -- 0.0.0.0/0 0.0.0.0/0 ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED DOCKER all -- 0.0.0.0/0 0.0.0.0/0 match-set docker-ext-bridges-v4 dst ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 Chain DOCKER (N references) ACCEPT tcp -- !br-xxx br-xxx 0.0.0.0/0 <container-ip> tcp dpt:<port> DROP all -- !br-xxx br-xxx 0.0.0.0/0 0.0.0.0/0 # blocks unpublished ports from external hosts # Verify your running Engine version first $ docker version --format '{{.Server.Version}}'
The key difference from pre-28 versions: each bridge now has a per-bridge DROP rule in the DOCKER chain that explicitly blocks inbound traffic from outside that bridge to unpublished ports. This rule did not exist before 28.0. If you upgrade an existing host and find previously-working access patterns broken, this DROP rule is the cause.
If you are upgrading an existing production host to Docker Engine 28+, audit your container access patterns before upgrading. Specifically, check whether any workloads or monitoring agents rely on reaching container IPs directly without port publishing. The docker.com/blog/docker-engine-28-hardening-container-networking-by-default/ post contains a checklist and opt-out instructions ("ip-forward-no-drop": true and gateway_mode_ipv4=nat-unprotected) if backward compatibility is required.
Docker Engine 29: Experimental nftables Support
Docker Engine 29.0, released November 10, 2025, introduced experimental opt-in support for nftables as the firewall backend. This is the first step toward eventually deprecating iptables in Docker. According to the release notes, nftables support can be enabled with "firewall-backend": "nftables" in daemon.json, but it carries important caveats.
"In this initial version, nftables support is 'experimental'. Please be cautious about deploying it in a production environment. Swarm support is planned for a future release. At present, it's not possible to enable Docker Engine's nftables support on a node with Swarm enabled." -- Docker Engineering Blog, November 11, 2025, docker.com/blog/docker-engine-version-29/
The key operational difference with nftables: there is no DOCKER-USER chain equivalent. Under nftables, Docker creates its rules in two dedicated tables -- ip docker-bridges and ip6 docker-bridges -- that it owns exclusively. If you have custom rules in the iptables DOCKER-USER chain, they must be migrated before switching backends, because nftables uses table isolation differently. The migration path is documented at docs.docker.com/engine/network/firewall-nftables/. For the nftables backend, you control rule ordering by creating your own table with a lower-priority base chain and using firewall marks (--bridge-accept-fwmark) rather than inserting into a shared chain.
If you are running the iptables backend (the current default), none of this affects you yet. But understanding the direction of travel matters for capacity planning: iptables will eventually be deprecated as the Docker default, and building automation that hard-codes chain names without version awareness will create maintenance debt.
Putting It Together
Docker networking is not arbitrary. Every behavior -- why names resolve on user-defined bridges but not on the default one, why overlay traffic appears as UDP on port 4789, why published ports survive container restarts -- follows directly from the Linux primitives underneath. Once you have a clear mental model of namespaces, veth pairs, bridges, iptables, and the embedded DNS resolver, the layers Docker adds become transparent rather than mysterious.
That mental model must also track the platform's evolution. Docker Engine 28 materially changed the iptables chain structure and hardened default security by dropping unpublished-port access from remote hosts. Docker Engine 29 introduced the first experimental nftables backend, signaling the long-term direction of travel. Articles, blog posts, and Stack Overflow answers written before early 2025 that describe specific chain names or default forwarding behavior may simply be wrong on a modern installation -- not because the authors were careless, but because Docker changed the ground truth. Verifying iptables behavior always starts with docker version and sudo iptables -L -n -v on the actual host.
The troubleshooting workflow follows the same logic as always: start with what you know is true (network membership), eliminate variables one layer at a time (IP before DNS, DNS before application), and use observable data (tcpdump, iptables listings, netshoot) rather than inference. The tools are all there -- the gap is usually just knowing where to look, and knowing which version of Docker you are looking at.
Container networking is Linux networking. The abstraction Docker provides is real and useful, but it is not a black box. Every packet has a path you can trace.
Sources and Further Reading
The claims in this article are grounded in the following primary sources, all of which can be read in full at the linked URLs:
- Docker, Inc. "Networking overview." Docker Documentation. docs.docker.com/engine/network/
- Docker, Inc. "Bridge network driver." Docker Documentation. docs.docker.com/engine/network/drivers/bridge/
- Docker, Inc. "Overlay network driver." Docker Documentation. docs.docker.com/engine/network/drivers/overlay/
- Docker, Inc. "Manage swarm service networks." Docker Documentation. Describes IPVS-based load balancing in the ingress network. docs.docker.com/engine/swarm/networking/
- Docker, Inc. "Use Swarm mode routing mesh." Docker Documentation. docs.docker.com/engine/swarm/ingress/
- Docker, Inc. "Port publishing and mapping." Docker Documentation. Covers userland-proxy behavior and localhost port binding. docs.docker.com/engine/network/port-publishing/
- Docker, Inc. "dockerd reference." Docker Documentation. Documents
--userland-proxydefault and behavior. docs.docker.com/reference/cli/dockerd/ - Docker, Inc. "Docker with iptables." Docker Documentation. Documents current chain set including
DOCKER-FORWARDintroduced in Engine 28.0.1. docs.docker.com/engine/network/firewall-iptables/ - Docker, Inc. "Docker with nftables." Docker Documentation. Documents experimental nftables backend introduced in Engine 29.0.0. docs.docker.com/engine/network/firewall-nftables/
- Docker, Inc. "Docker Engine v28: Hardening Container Networking by Default." Docker Engineering Blog, February 28, 2025. docker.com/blog/docker-engine-28-hardening-container-networking-by-default/
- Docker, Inc. "Docker Engine v29." Docker Engineering Blog, November 11, 2025. docker.com/blog/docker-engine-version-29/
- Docker, Inc. "Engine v28 Release Notes." Docker Documentation. Documents moby/moby#49518 (DOCKER-FORWARD chain) and moby/moby#48724 (unpublished port DROP rule). docs.docker.com/engine/release-notes/28/
- Docker, Inc. "Engine v29 Release Notes." Docker Documentation. docs.docker.com/engine/release-notes/29/
- Docker, Inc. "Security -- Docker Engine." Documents kernel namespace maturity at Linux 2.6.26. docs.docker.com/engine/security/
- robmry. "Add chain DOCKER-FORWARD." moby/moby Pull Request #49518. GitHub. github.com/moby/moby/pull/49518
- Mahalingam, M. et al. "VXLAN: A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks." IETF RFC 7348. August 2014. datatracker.ietf.org/doc/html/rfc7348
- Linux Kernel Documentation. "Network Namespaces." kernel.org/doc/html/latest/networking/net_namespaces.html
- Linux Kernel Documentation. "Virtual Ethernet Device." kernel.org/doc/Documentation/networking/veth.txt
- Corbet, J. "Namespaces in operation, part 7: Network namespaces." LWN.net. Confirms network namespaces entered the kernel in 2.6.24. lwn.net/Articles/580893/
- Datadog Security. "CIS Docker Benchmark 2.1: Containers on the default network bridge should restrict network traffic." Documents that
--iccapplies only to the default bridge. docs.datadoghq.com/security/default_rules/cis-docker-1.2.0-2.1/ - moby/moby GitHub Issue #40428. "iptables rules that are set inside the container for DNS resolution." Documents the DOCKER_OUTPUT and DOCKER_POSTROUTING chains with per-container random DNS ports. github.com/moby/moby/issues/40428
- Chiu, HungWei. "Fun DNS Facts Learned from the KIND Environment." Medium. Explains the Docker DNS iptables DNAT mechanism and why no process appears on port 53 inside containers. hwchiu.medium.com/fun-dns-facts-learned-from-the-kind-environment-241e0ea8c6d4
- SUSE Engineering. "How Docker Swarm Container Networking Works -- Under the Hood." Documents the ingress-sbox namespace and IPVS internals. suse.com/c/docker-swarm-container-networking/
- Netshoot project. "A Docker + Kubernetes network trouble-shooting swiss-army container." GitHub. github.com/nicolaka/netshoot