[Hands-on] Understanding multi-container host networking using vxlan overlay network feature.

Faysal Mehedi
8 min readMar 1, 2022

--

This hands-on demo will provide an overview of container communication between multi-node or multi container daemon under the hood. Find the github repo here.

What is VxLAN?

VxLAN — or Virtual Extensible LAN addresses the requirements of the Layer 2 and Layer 3 data center network infrastructure in the presence of VMs in a multi-tenant environment. It runs over the existing networking infrastructure and provides a means to "stretch" a Layer 2 network. In short, VXLAN is a Layer 2 overlay scheme on a Layer 3 network. Each overlay is termed a VXLAN segment. Only VMs within the same VXLAN segment can communicate with each other. Each VXLAN segment is identified through a 24-bit segment ID, termed the "VNI". This allows up to 16 M VXLAN segments to coexist within the same administrative domain.

What is VNI?

Unlike VLAN, VxLAN does not have ID limitation. It uses a 24-bit header, which gives us about 16 million VNI’s to use. A VNI VXLAN Network Identifier (VNI) is the identifier for the LAN segment, similar to a VLAN ID. With an address space this large, an ID can be assigned to a customer, and it can remain unique across the entire network.

What is VTEP?

VxLAN traffic is encapsulated before it is sent over the network. This creates stateless tunnels across the network, from the source switch to the destination switch. The encapsulation and decapsulation are handled by a component called a VTEP (VxLAN Tunnel End Point). A VTEP has an IP address in the underlay network. It also has one or more VNI’s associated with it. When frames from one of these VNI’s arrives at the Ingress VTEP, the VTEP encapsulates it with UDP and IP headers. The encapsulated packet is sent over the IP network to the Egress VTEP. When it arrives, the VTEP removes the IP and UDP headers, and delivers the frame as normal.

What are we going to cover in this hands-on demo?

  • We will use two VM for this, will install docker for running container.
  • We have to create separate subnet and assign static IP address for simplicity.
  • Then we will create vxlan bridge using linux “ip link vxlan” feature.
  • Then bind the vxlan to the docker bridge to crate the tunnel
  • Hopefully then we see the response from another host conatiner.

Get an overview of the hands-on from the diagram below

Following hands-on demo will be based on this diagram.

Let’s start…

Step 0: For this demo, anyone can deploy two VM on any hypervisor or virtualization technology. Make sure they are on the same network thus hosts can communicate each other. I launched two ec2 instance(ubuntu) which is on same VPC from AWS to simulate this hands-on. In case of AWS, please allow all traffic in security group to avoid connectivity issues.

Step 1: Install docker client and create separate subnet using docker network utility

For Host-01

# update the repository and install docker
sudo apt update
sudo apt install -y docker.io


# create a separate docker bridge network
sudo docker network create --subnet 172.18.0.0/16 vxlan-net

c43287381077769a873105e49d34d45f5426e42adf52ef7a92394fe0192715b1

# list all networks in docker
sudo docker network ls

NETWORK ID NAME DRIVER SCOPE
982c9e8e9e40 bridge bridge local
2c2b3714bca9 host host local
223b98ebd6ef none null local
c43287381077 vxlan-net bridge local

# Check interfaces
ip a

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc fq_codel state UP group default qlen 1000
link/ether 0a:f5:a4:e1:3a:c2 brd ff:ff:ff:ff:ff:ff
inet 10.0.1.4/24 brd 10.0.1.255 scope global dynamic eth0
valid_lft 2529sec preferred_lft 2529sec
inet6 fe80::8f5:a4ff:fee1:3ac2/64 scope link
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:9f:18:ff:df brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
4: br-c43287381077: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:f2:77:0d:ab brd ff:ff:ff:ff:ff:ff
inet 172.18.0.1/16 brd 172.18.255.255 scope global br-c43287381077
valid_lft forever preferred_lft forever

For Host-02

sudo apt update
sudo apt install -y docker.io

sudo docker network create --subnet 172.18.0.0/16 vxlan-net


c485be328b349a64ab32f1743658e09d16d39b3fef83946bbfad662831a92a33

sudo docker network ls

NETWORK ID NAME DRIVER SCOPE
6e1b56c794c3 bridge bridge local
d1f9b80443c4 host host local
922f1ff12422 none null local
c485be328b34 vxlan-net bridge local

ip a

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc fq_codel state UP group default qlen 1000
link/ether 0a:05:59:dc:14:18 brd ff:ff:ff:ff:ff:ff
inet 10.0.1.41/24 brd 10.0.1.255 scope global dynamic eth0
valid_lft 1984sec preferred_lft 1984sec
inet6 fe80::805:59ff:fedc:1418/64 scope link
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:57:11:86:a0 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
4: br-c485be328b34: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:29:42:6b:94 brd ff:ff:ff:ff:ff:ff
inet 172.18.0.1/16 brd 172.18.255.255 scope global br-c485be328b34
valid_lft forever preferred_lft forever

Step 2: Run docker container on top of newly created docker bridge network and try to ping docker bridge

For Host-01

# running ubuntu container with "sleep 3000" and a static ip
sudo docker run -d --net vxlan-net --ip 172.18.0.11 ubuntu sleep 3000

a90cbd31d1a20b5e8369f211e65e6fdabf611d01353cb4dc3ff646166ce18e26

# check the container running or not
sudo docker ps

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a90cbd31d1a2 ubuntu "sleep 3000" 7 seconds ago Up 6 seconds jolly_khayyam

# check the IPAddress to make sure that the ip assigned properly
sudo docker inspect a9 | grep IPAddress
"SecondaryIPAddresses": null,
"IPAddress": "",
"IPAddress": "172.18.0.11",

# ping the docker bridge ip to see whether the traffic can pass
ping 172.18.0.1 -c 2
PING 172.18.0.1 (172.18.0.1) 56(84) bytes of data.
64 bytes from 172.18.0.1: icmp_seq=1 ttl=64 time=0.047 ms
64 bytes from 172.18.0.1: icmp_seq=2 ttl=64 time=0.044 ms

--- 172.18.0.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1010ms
rtt min/avg/max/mdev = 0.044/0.045/0.047/0.001 ms

For Host-02

sudo docker run -d --net vxlan-net --ip 172.18.0.12 ubuntu sleep 3000

7755c18d3550d1fd80d0d00d9dce85dfd8370b2d26cbc088c97bd1b91ac95be7

sudo docker ps


CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7755c18d3550 ubuntu "sleep 3000" 9 seconds ago Up 8 seconds affectionate_black

sudo docker inspect 77 | grep IPAddress


"SecondaryIPAddresses": null,
"IPAddress": "",
"IPAddress": "172.18.0.12",

ping 172.18.0.1 -c 2


PING 172.18.0.1 (172.18.0.1) 56(84) bytes of data.
64 bytes from 172.18.0.1: icmp_seq=1 ttl=64 time=0.047 ms
64 bytes from 172.18.0.1: icmp_seq=2 ttl=64 time=0.044 ms

--- 172.18.0.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1010ms
rtt min/avg/max/mdev = 0.044/0.045/0.047/0.001 ms

Step 3: Now access to the running container and try to ping another hosts running container via IP Address. Though hosts can communicate each other, conatiner communication should fail because there is no tunnel or anything to carry the traffic.

For Host-01

# enter the running container using exec 
sudo docker exec -it a9 bash
# Now we are inside running container
# update the package and install net-tools and ping tools
apt-get update
apt-get install net-tools
apt-get install iputils-ping


# Now ping the another container
ping 172.18.0.12 -c 2

PING 172.18.0.12 (172.18.0.12) 56(84) bytes of data.
--- 172.18.0.12 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 2028ms

For Host-02

sudo docker exec -it 77 bash# inside container
apt-get update
apt-get install net-tools
apt-get install iputils-ping
# Now ping the another container
ping 172.18.0.11 -c 2
PING 172.18.0.11 (172.18.0.11) 56(84) bytes of data.
--- 172.18.0.11 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 2028ms

Step 4: It’s time to create a VxLAN tunnel to establish communication between two hosts running containers. Then attch the vxlan to the docker bridge. Make sure the VNI ID is the same for both hosts.

For Host-01

# check the bridges list on the hosts
brctl show
bridge name bridge id STP enabled interfaces
br-c43287381077 8000.0242f2770dab no veth725a704
docker0 8000.02429f18ffdf no
# create a vxlan
# 'vxlan-demo' is the name of the interface, type should be vxlan
# VNI ID is 100
# dstport should be 4789 which a udp standard port for vxlan communication
# 10.0.1.41 is the ip of another host
sudo ip link add vxlan-demo type vxlan id 100 remote 10.0.1.41 dstport 4789 dev eth0
# check interface list if the vxlan interface created
ip a | grep vxlan
9: vxlan-demo: <BROADCAST,MULTICAST> mtu 8951 qdisc noop state DOWN group default qlen 1000# make the interface up
sudo ip link set vxlan-demo up
# now attach the newly created vxlan interface to
# the docker bridge we created
sudo brctl addif br-c43287381077 vxlan-demo
# check the route to ensure everything is okay. here '172.18.0.0' part is our concern part.route -nKernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.0.1.1 0.0.0.0 UG 100 0 0 eth0
10.0.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
10.0.1.1 0.0.0.0 255.255.255.255 UH 100 0 0 eth0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
172.18.0.0 0.0.0.0 255.255.0.0 U 0 0 0 br-c43287381077

For Host-02

brctl showbridge name	bridge id		STP enabled	interfaces
br-c485be328b34 8000.024229426b94 no veth478bcd1
docker0 8000.0242571186a0 no
# 10.0.1.4 is the ip of another host
# make sure VNI ID is the same on both hosts, this is important
sudo ip link add vxlan-demo type vxlan id 100 remote 10.0.1.4 dstport 4789 dev eth0ip a | grep vxlan9: vxlan-demo: <BROADCAST,MULTICAST> mtu 8951 qdisc noop state DOWN group default qlen 1000sudo ip link set vxlan-demo up
sudo brctl addif br-c485be328b34 vxlan-demo
route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.0.1.1 0.0.0.0 UG 100 0 0 eth0
10.0.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
10.0.1.1 0.0.0.0 255.255.255.255 UH 100 0 0 eth0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
172.18.0.0 0.0.0.0 255.255.0.0 U 0 0 0 br-c485be328b34

Step 5: Now test the connectivity. It should work now. A Vxlan Overlay Network Tunnel has been created.

For Host-01

sudo docker exec -it a9 bash# ping the other container IP
ping 172.18.0.12 -c 2
PING 172.18.0.12 (172.18.0.12) 56(84) bytes of data.
64 bytes from 172.18.0.12: icmp_seq=1 ttl=64 time=0.601 ms
64 bytes from 172.18.0.12: icmp_seq=2 ttl=64 time=0.601 ms
--- 172.18.0.12 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1018ms
rtt min/avg/max/mdev = 0.601/0.601/0.601/0.000 ms

For Host-02

sudo docker exec -it 77 bashping 172.18.0.11 -c 2PING 172.18.0.11 (172.18.0.11) 56(84) bytes of data.
64 bytes from 172.18.0.11: icmp_seq=1 ttl=64 time=0.601 ms
64 bytes from 172.18.0.11: icmp_seq=2 ttl=64 time=0.601 ms
--- 172.18.0.11 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1018ms
rtt min/avg/max/mdev = 0.601/0.601/0.601/0.000 ms

Now the hands on completed;

--

--