docker networking part 2

Docker networking

Environment used in this example:

  • host: MacOS 10.10.5
  • coreOS: alpha (1000.0.0)
  • Docker: version 1.10.3

part2

Hello fellow docker enthusiast! I am going to lay down some of the networking principals of docker here, with some concrete examples and detailed steps on how to see it all yourself.

Throughout the article I am going to use coreOS-on-vagrant as my host for docker demo. The reason I am choosing coreOS here, is that it comes with docker pre-installed, it takes 5 minutes to start it up and it all makes it a perfect playground for docker.

You can choose to have it on one of the cloud providers(like amazon EC2), bare metal servers or virtualization providers (like Vagrant, which we will use in this example). You can find more details on how to quick start with coreOS here

In the first part I have shown how to setup coreos and start using docker. We also looked into the default networks created by docker and what does it mean in terms of linux configurations.

Now we will look into how to create our own networks.

Single host networks

Since docker 1.9 it is quite straightforward to create network isolation for docker containers. First let’s have a look into how to achieve this within single host. We’ll create 2 network bridges, connect 2 busybox sample applications to each of them and review the connectivity.

As we saw in the first part the network configurations inside docker can be viewed using

docker network ls
NETWORK ID          NAME                DRIVER
8130448eb874        host                host
7aebb3000431        bridge              bridge
e8675e62aeb1        none                null

Using the same docker network command we can create network configurations:

docker network create net1
6ff909d650dc708fb1d47c95490d0383652da6b70c1b9c1a8f87872a003b05c9

docker network create net2
1a6e26180e3dfc51c673b438b81b84a813f8a6e51dcf34b025e1f180337ddd69

Now let’s have a look what happened on the host:

ifconfig
br-1a6e26180e3d: flags=4099  mtu 1500
        inet 172.20.0.1  netmask 255.255.0.0  broadcast 0.0.0.0
        ether 02:42:8d:e6:67:43  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

br-6ff909d650dc: flags=4099  mtu 1500
        inet 172.19.0.1  netmask 255.255.0.0  broadcast 0.0.0.0
        ether 02:42:2c:10:69:8d  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

We have 2 bridges created for the 2 network configurations we had. In order to see subnets attached to each of the network configurations we can use

docker network inspect net1
[
    {
        "Name": "net1",
        "Id": "6ff909d650dc708fb1d47c95490d0383652da6b70c1b9c1a8f87872a003b05c9",
        "Scope": "local",
        "Driver": "bridge",
        "IPAM": {
            "Driver": "default",
            "Options": {},
            "Config": [
                {
                    "Subnet": "172.19.0.0/16",
                    "Gateway": "172.19.0.1/16"
                }
            ]
        },
        "Containers": {},
        "Options": {}
    }
]

One could also specify the desired subnet (alonside other configurations) during the creation of the network.

In order to attach the container to the desired network use the --net=${network_name} option:

docker run -itd --net=net1 --name=app1_1 busybox
26faf5ebe72a7d43313bf7e93f05942bf97e5fbae2d7dd62c66bd2676ee55b31

docker run -itd --net=net1 --name=app1_2 busybox
92d27bb0f1f7c834e22a9496d5b01933374e1b1ac68347350229a95fc32b500d

docker run -itd --net=net2 --name=app2_1 busybox
c3744dc93be6c42c4a2efee461529973b77b893da8975c5ccc4f4452b4fadc72

docker run -itd --net=net2 --name=app2_2 busybox
5e68d4aa0b2adf11bf8feb0f98b0c015dcaa3442a7e9206811f338b3cf1e4b1b

Each of the boxes will get ip address from the subnet of the respective network:

for i in `docker ps --format={{.Names}}` ; do echo $i; docker exec -it $i ifconfig | grep "inet addr" | grep -v "127.0.0.1" ; done
app2_2
          inet addr:172.20.0.3  Bcast:0.0.0.0  Mask:255.255.0.0
app2_1
          inet addr:172.20.0.2  Bcast:0.0.0.0  Mask:255.255.0.0
app1_2
          inet addr:172.19.0.3  Bcast:0.0.0.0  Mask:255.255.0.0
app1_1
          inet addr:172.19.0.2  Bcast:0.0.0.0  Mask:255.255.0.0

Hence, we can ping app1_1 from app1_2, but not from app2_*:

docker exec -it app1_2 ping app1_1
PING app1_1 (172.18.0.2): 56 data bytes
64 bytes from 172.18.0.2: seq=0 ttl=64 time=0.087 ms
64 bytes from 172.18.0.2: seq=1 ttl=64 time=0.082 ms
^C
--- app1_1 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.082/0.084/0.087 ms
docker exec -it app2_1 ping app1_1
ping: bad address 'app1_1'
docker exec -it app2_1 ping 172.19.0.2
PING 172.19.0.2 (172.19.0.2): 56 data bytes

One can also connect a running container to an existing network, and in that case it would be immediately assigned an IP from the range of the specified network:

docker network connect app1 app2_1
docker exec -it app2_1 ifconfig
eth0      Link encap:Ethernet  HWaddr 02:42:AC:14:00:02
          inet addr:172.20.0.2  Bcast:0.0.0.0  Mask:255.255.0.0
          inet6 addr: fe80::42:acff:fe14:2/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:35 errors:0 dropped:0 overruns:0 frame:0
          TX packets:46 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:5308 (5.1 KiB)  TX bytes:4300 (4.1 KiB)

eth1      Link encap:Ethernet  HWaddr 02:42:AC:13:00:04
          inet addr:172.19.0.4  Bcast:0.0.0.0  Mask:255.255.0.0
          inet6 addr: fe80::42:acff:fe13:4/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:508 (508.0 B)  TX bytes:598 (598.0 B)

now app1_1 is visible from the app2_1:

docker exec -it app2_1 ping app1_1
PING app1_1 (172.19.0.2): 56 data bytes
64 bytes from 172.19.0.2: seq=0 ttl=64 time=0.041 ms
64 bytes from 172.19.0.2: seq=1 ttl=64 time=0.080 ms

This is all good, but what we really need in real life is creating networks between containers that are deployed on different hosts. The docker overlay network comes into play here.

multi-host overlay network

Underneath, overlay network is using vxlan technology, which is configured using the libnetwork library which is an abstraction layer on top implemented by the docker team.

In order to have connectivity between containers running on different hosts, docker overlay network requires a distributed key-value storage configured. You can choose to have Zookeeper, Consul or etcd. The chosen key-value storage is access by docker using libkv library and is used to maintain the information about the hosts connected to the created network, such as IP address mappings, VXLAN ID allocation and mapping of the network and the allocated subnet to VXLAN id. For us this would be obviously etcd, as coreos has the native support for it.

In the first part I described how to get coreos configured on your environment. I will show the example using my vagrant-based coreos instance, but the steps for docker should be similar regardless your environment.

Basically, what we need, is to configure each docker daemon in the cluster to start with --cluster-store and --cluster-advertise parameters pointing to the respective key-value storage. First will be used to access the keyvalue storage, the second one to advertise the machine on the network.

here is the list of default protocol/port configurations for each of the storages, where STORAGE_IP is the ip address of your storage server:

  • consul: –cluster-store=consul://$(STORAGE_IP):8500 –cluster-advertise=$(INTERFACE_TO_ADVERTISE):2376
  • zookeeper: –cluster-store=zk://$(STORAGE_IP):2181 –cluster-advertise=$(INTERFACE_TO_ADVERTISE):2376
  • etcd: –cluster-store=zk://$(STORAGE_IP):2181 –cluster-advertise=$(HOST_IP):2375

Let’s have a look in details on the etcd solution as it comes natively on CoreOS.

etcd as kv storage

First, make sure etcd is configured for your coreOS instance. Here is an example of user-data file located in the vagrant-coreos folder:

coreos:
   etcd2:
     advertise-client-urls: http://$public_ipv4:2379
     initial-advertise-peer-urls: http://$private_ipv4:2380
     listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001
     listen-peer-urls: http://$private_ipv4:2380,http://$private_ipv4:7001
     discovery: https://discovery.etcd.io/baa4f37b77b673efe27ecc871438edae
   fleet:
     public-ip: $public_ipv4
   units:
   - name: etcd2.service
     command: start
   - name: fleet.service
     command: start
   - name: docker-tcp.socket
     command: start
     enable: true
     content: |
       [Unit]
       Description=Docker Socket for the API

       [Socket]
       ListenStream=2375
       Service=docker.service
       BindIPv6Only=both

       [Install]
       WantedBy=sockets.target

If you make changes in the user-data file, don’t forget to run vagrant reload. We’ll use the coreOS drop-in configurations feature to configure docker daemon with correct cluster-related options:

sudo mkdir -p /etc/systemd/system/docker.service.d/

We’ll override the DOCKER_OPTS env variable by putting this inside the /etc/systemd/system/docker.service.d/overlay.conf file (name is arbitrary)

[Service]
Environment="DOCKER_CGROUPS=--exec-opt native.cgroupdriver=systemd"
Environment="DOCKER_OPTS= --cluster-store=etcd://172.17.8.101:2379 --cluster-advertise=172.17.8.101:2380"

Make sure to replace the IP address with the one of your host. The official etcd ports are 2379 for client requests, and 2380 for peer communication. This can be configured through user-data config.

And while we are on this, let’s also enable debug logs on the docker daemon to have more visibility of what’s going on. For that let’s override the docker.service configuration by copying and changing the default one:

sudo cp /usr/lib64/systemd/system/docker.service /etc/systemd/system/docker.service

and change add the -D option to the ExecStart command of the Service this is how my docker.service file looks right now:

[Unit]
Description=Docker Application Container Engine
Documentation=http://docs.docker.com
After=docker.socket early-docker.target network.target
Requires=docker.socket early-docker.target

[Service]
Environment="DOCKER_CGROUPS=--exec-opt native.cgroupdriver=systemd"
EnvironmentFile=-/run/flannel_docker_opts.env
MountFlags=slave
LimitNOFILE=1048576
LimitNPROC=1048576
ExecStart=/usr/lib/coreos/dockerd daemon -D --host=fd:// $DOCKER_OPTS $DOCKER_CGROUPS $DOCKER_OPT_BIP $DOCKER_OPT_MTU $DOCKER_OPT_IPMASQ

[Install]
WantedBy=multi-user.target

Reloading service to pick up additional configuration:

sudo systemctl daemon-reload

and restarting the service:

sudo systemctl restart docker

Make sure the configurations have been taken into account:

ps -ef  | grep docker
root       785     1  0 08:00 ?        00:00:04 docker daemon --host=fd:// --bridge=none --iptables=false --ip-masq=false --graph=/var/lib/early-docker --pidfile=/var/run/early-docker.pid --exec-opt native.cgroupdriver=systemd --selinux-enabled
root       817     1  0 08:00 ?        00:00:00 /usr/libexec/sdnotify-proxy /run/flannel/sd.sock /usr/bin/docker run --net=host --privileged=true --rm --volume=/run/flannel:/run/flannel --env=NOTIFY_SOCKET=/run/flannel/sd.sock --env=AWS_ACCESS_KEY_ID= --env=AWS_SECRET_ACCESS_KEY= --env-file=/run/flannel/options.env --volume=/usr/share/ca-certificates:/etc/ssl/certs:ro --volume=/etc/ssl/etcd:/etc/ssl/etcd:ro quay.io/coreos/flannel:0.5.5 /opt/bin/flanneld --ip-masq=true
root       822   817  0 08:00 ?        00:00:02 /usr/bin/docker run --net=host --privileged=true --rm --volume=/run/flannel:/run/flannel --env=NOTIFY_SOCKET=/run/flannel/sd.sock --env=AWS_ACCESS_KEY_ID= --env=AWS_SECRET_ACCESS_KEY= --env-file=/run/flannel/options.env --volume=/usr/share/ca-certificates:/etc/ssl/certs:ro --volume=/etc/ssl/etcd:/etc/ssl/etcd:ro quay.io/coreos/flannel:0.5.5 /opt/bin/flanneld --ip-masq=true
root      1014     1  3 15:30 ?        00:00:00 docker daemon -D --host=fd:// --cluster-store=etcd://172.17.8.101:2379 --cluster-advertise=172.17.8.101:2375 --exec-opt native.cgroupdriver=systemd --bip=10.1.64.1/24 --mtu=1472 --ip-masq=false --selinux-enabled
core      1270   990  0 15:30 pts/0    00:00:00 grep --colour=auto docker

repeat for core-02 host (and you would need to repeat this for every host in the network cluster you are willing to make)

Let’s have a look, on what kind of information has now be stored in the key-value storage for our hosts. For that we’ll use the etcdctl binary.

On both of the nodes you should see something like this:

etcdctl ls /docker/nodes
/docker/nodes/172.17.8.101:2375
/docker/nodes/172.17.8.102:2375

Where you would see the list of nodes that have been configured using --cluster-advetise option.

Now if we also have a look into the logs of the docker service by running journalctl -f -u docker.service, you should notice logs lines like this:

Apr 08 18:19:26 core-01 dockerd[1480]: time="2016-04-08T18:19:26.371107042Z" level=info msg="2016/04/08 18:19:26 [INFO] serf: EventMemberJoin: core-02 172.17.8.102\n"
Apr 08 18:19:26 core-01 dockerd[1480]: time="2016-04-08T18:19:26.416311827Z" level=debug msg="2016/04/08 18:19:26 [DEBUG] serf: messageJoinType: core-02\n"
Apr 08 18:19:26 core-01 dockerd[1480]: time="2016-04-08T18:19:26.570811000Z" level=debug msg="2016/04/08 18:19:26 [DEBUG] serf: messageJoinType: core-02\n"
Apr 08 18:19:26 core-01 dockerd[1480]: time="2016-04-08T18:19:26.616466048Z" level=debug msg="2016/04/08 18:19:26 [DEBUG] serf: messageJoinType: core-02\n"
Apr 08 18:19:26 core-01 dockerd[1480]: time="2016-04-08T18:19:26.771490977Z" level=debug msg="2016/04/08 18:19:26 [DEBUG] serf: messageJoinType: core-02\n"
Apr 08 18:19:30 core-01 dockerd[1480]: time="2016-04-08T18:19:30.203106067Z" level=debug msg="Watch triggered with 2 nodes" discovery=etcd
Apr 08 18:19:31 core-01 dockerd[1480]: time="2016-04-08T18:19:31.280923485Z" level=debug msg="Watch triggered with 2 nodes" discovery=etcd

serf is another technology that is being used underneath by docker. It provides a lightweight, eventually-consistent node communication, including quick failure detection and support for custom events. It is based on gossip protocol widely used in distributed systems.

Let’s now explore the docker cli usage for overlay network creation, which is independent from KV storage configured.

docker overlay network

Just like we did with bridge network, we will create network using docker network command, but this time using the default for multi-host networking overlay driver:

docker network create --driver overlay shared
e2ea3769008a97adefd1104745122e7391c1ca856135150f08480f3e5bcab86f

now if we check the list of networks we should see:

docker network ls
NETWORK ID          NAME                DRIVER
e2ea3769008a        shared              overlay
e274d7b5d4c6        bridge              bridge
28e1aa76a332        none                null
46b84c4bf302        host                host

and actually the same shared network is available now on the second host (and all hosts configured through KV storage):

docker network ls
NETWORK ID          NAME                DRIVER
e2ea3769008a        shared              overlay
f3e1636089dd        none                null
560295a2ead7        host                host
03316fab2d4d        bridge              bridge

Now let’s create containers connected to this network:

docker run -itd –name app1 –net shared busybox
f47be93a2d5f3c0ee11e9a0c73a97f2b40bf71f7a38c6d3ea020388ba4883a53

and on the second node:

docker run -itd –name app2 –net shared busybox
200d4165f9d8cc301dcca580b68ff61074e4c3b85af933368102e71cdbb67c59

Now let’s check ip configurations for these instances:

docker exec -it app1 ifconfig
eth0      Link encap:Ethernet  HWaddr 02:42:0A:00:00:02
          inet addr:10.0.0.2  Bcast:0.0.0.0  Mask:255.255.255.0
          inet6 addr: fe80::42:aff:fe00:2/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
          RX packets:15 errors:0 dropped:0 overruns:0 frame:0
          TX packets:11 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1206 (1.1 KiB)  TX bytes:906 (906.0 B)

eth1      Link encap:Ethernet  HWaddr 02:42:AC:13:00:02
          inet addr:172.19.0.2  Bcast:0.0.0.0  Mask:255.255.0.0
          inet6 addr: fe80::42:acff:fe13:2/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:15 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1206 (1.1 KiB)  TX bytes:816 (816.0 B)

docker exec -it app2 ifconfig
eth0      Link encap:Ethernet  HWaddr 02:42:0A:00:00:03
          inet addr:10.0.0.3  Bcast:0.0.0.0  Mask:255.255.255.0
          inet6 addr: fe80::42:aff:fe00:3/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
          RX packets:15 errors:0 dropped:0 overruns:0 frame:0
          TX packets:11 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1206 (1.1 KiB)  TX bytes:906 (906.0 B)

eth1      Link encap:Ethernet  HWaddr 02:42:AC:13:00:02
          inet addr:172.19.0.2  Bcast:0.0.0.0  Mask:255.255.0.0
          inet6 addr: fe80::42:acff:fe13:2/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:12 errors:0 dropped:0 overruns:0 frame:0
          TX packets:9 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:960 (960.0 B)  TX bytes:738 (738.0 B)

You see, both of the containers acquire ip addresses from 10.0.0.0/24 subnet, which is indeed the one chosen for our shared network:

docker network inspect shared
[
    {
        "Name": "shared",
        "Id": "e2ea3769008a97adefd1104745122e7391c1ca856135150f08480f3e5bcab86f",
        "Scope": "global",
        "Driver": "overlay",
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.0.0/24",
                    "Gateway": "10.0.0.1/24"
                }
            ]
        },
        "Containers": {
            "200d4165f9d8cc301dcca580b68ff61074e4c3b85af933368102e71cdbb67c59": {
                "Name": "app2",
                "EndpointID": "cd1b33a9076f1a71a2c68985629e0eab55e33b02541b367fc87019f099c00850",
                "MacAddress": "02:42:0a:00:00:03",
                "IPv4Address": "10.0.0.3/24",
                "IPv6Address": ""
            }
        },
        "Options": {}
    }
]

Checking connectivity between containers:

docker exec -it app1 ping app2
PING app2 (10.0.0.3): 56 data bytes
64 bytes from 10.0.0.3: seq=0 ttl=64 time=0.993 ms
64 bytes from 10.0.0.3: seq=1 ttl=64 time=0.765 ms
^C
--- app2 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.765/0.879/0.993 ms

That’s it! We have created network between docker containers spread across different hosts in a matter of several commands.

vxlan details

If you want to see what is docker doing underneath with VXLAN configurations, beware that each container creates it’s own namespace. This means, to run the standard iproute2 commands, you would need to enter the namespace of the container’s PID. Using nsenter and specifying the PID of the container we are running in the shared network:

sudo nsenter -t $(docker inspect --format {{.State.Pid}} app1) -n ip neigh
10.0.0.3 dev eth0 lladdr 02:42:0a:00:00:03 STALE

or for example

if you watch the output of

watch -n 1 “sudo nsenter -t $(docker inspect –format {{.State.Pid}} app1) -n ip -s -h link show eth0”

and ping the other container that is running on core-02 host, you’ll see packets flowing through eth0 interface

docker exec -it app1 ping app2

sudo nsenter -t $(docker inspect --format {{.State.Pid}} app1) -n ip -s -h link show eth0
8: eth0@if9:  mtu 1450 qdisc noqueue state UP mode DEFAULT group default
    link/ether 02:42:0a:00:00:02 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    5.88k      65       0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    5.58k      61       0       0       0       0

More Reading
Newer// gomongo part 2