Exploring ROS 2 Kubernetes configurations

Blog Ciencia, Cloud, Conocimientos Generales, k8s, MicroK8s, multus, robotics, ROS 2 noviembre 23, 2020

Kubernetes and robotics make a great match. However, as we have seen, robots running ROS 2 can be tricky to set up on Kubernetes. This blog series has explored running ROS 2 on Kubernetes, set up a simple talker and listener, and distributed that demo across three machines.

The configurations presented may not quite fit your implementation, and you may want to dig a bit deeper into network traffic when troubleshooting. This post addresses these concerns by demonstrating two general principles for setting up a ROS 2 system within Kubernetes:

Only have ROS running in one container within a pod. By default, when multiple ROS containers attempt to communicate within a single pod, only a single one actually succeeds.
Enable multicast discovery traffic. For MicroK8s, this requires adding another Container Network Interface (CNI).

Prerequisites

Before beginning, install the MicroK8s snap, add your account to the microk8s group, and enable both the DNS and Multus pluigins:

sudo snap install microk8s --classic
sudo usermod -a -G microk8s $USER
sudo chown -f -R $USER ~/.kube
microk8s enable dns multus

MicroK8s brings a full Kubernetes install to your machine with this single command, and the baseline ROS 2 Foxy docker image fits neatly into this Kubernetes configuration. Our challenge is to configure these projects to all work nicely together. The configurations below create ROS containers based on the official baseline ROS Foxy docker image maintained by Open Robotics.

Be sure to delete pods and containers between each test run. The configuration files give commands needed to clean up after completing the test.

Monitoring Kubernetes network traffic

While working with these configurations, consider using tcpdump to monitor all of your traffic. The fundamental tool for network troubleshooting remains tcpdump, and that’s no exception with Kubernetes. Since all container traffic happens through IP, often troubleshooting begins by simply running tcpdump on your host workstation. The following command saves all network traffic to the file microk8s.pcap:

sudo tcpdump -i any -w microk8s.pcap

This file contains host network traffic as well as traffic to and from all pods and containers on the host. As mentioned in the first post of this series, container-to-container traffic happens via localhost, so intra-pod traffic shows as both source and destination IP address 127.0.0.1.

If a particular container gives you trouble, log into the container, install tcpdump and run an interactive packet capture session to troubleshoot.

Now let’s get started with a few different configurations.

Example 1: A talker and a listener in the same pod

This configuration starts a talker in one container and a listener in a second one. Does the listener hear chatter from the talker?

apiVersion: v1
kind: Pod
metadata:
  name: tl
spec:
  containers:
  - name: t1 
    image: ros:foxy
    command: ["/bin/bash", "-c"]
    args: ["source /opt/ros/foxy/setup.bash && apt update && apt install -y curl && curl https://raw.githubusercontent.com/canonical/robotics-blog-k8s/main/publisher.py > publisher.py && /bin/python3 publisher.py t1"]
  - name: l1 
    image: ros:foxy
    command: ["/bin/bash", "-c"]
    args: ["source /opt/ros/foxy/setup.bash && apt update && apt install -y curl && curl https://raw.githubusercontent.com/canonical/robotics-blog-k8s/main/subscriber.py > subscriber.py && /bin/python3 subscriber.py l1"]

After applying this configuration, use the following command to see logs from container l1 in pod tl:

microk8s.kubectl logs --follow tl -c l1

These logs will show whether the listener receives messages from the talker. If there are no messages on the listener, use a similar command for the talker container to ensure the talker is properly sending messages.

Container l1 may or may not receive the messages sent by container t1, and results may differ each time a container starts. The two containers in the same pod communicate with each other over the host’s loopback interface; however, they may attempt to share the same network ports and addresses on the host.

Continue exploring this setup with example 2.

Example 2: Two talkers and one listener in the same pod

The second example configuration follows from the first example and adds another talker node. The listener should receive two messages–one from container t1 and one from container t2–every second. Does this work?

apiVersion: v1
kind: Pod
metadata:
  name: tl
spec:
  containers:
  - name: t1 
    image: ros:foxy
    command: ["/bin/bash", "-c"]
    args: ["source /opt/ros/foxy/setup.bash && apt update && apt install -y curl && curl https://raw.githubusercontent.com/canonical/robotics-blog-k8s/main/publisher.py > publisher.py && /bin/python3 publisher.py t1"]
  - name: t2 
    image: ros:foxy
    command: ["/bin/bash", "-c"]
    args: ["source /opt/ros/foxy/setup.bash && apt update && apt install -y curl && curl https://raw.githubusercontent.com/canonical/robotics-blog-k8s/main/publisher.py > publisher.py && /bin/python3 publisher.py t2"]
  - name: l1 
    image: ros:foxy
    command: ["/bin/bash", "-c"]
    args: ["source /opt/ros/foxy/setup.bash && apt update && apt install -y curl && curl https://raw.githubusercontent.com/canonical/robotics-blog-k8s/main/subscriber.py > subscriber.py && /bin/python3 subscriber.py l1"]

Whether or not the first example succeeded, this typically does not, but again the results are unpredictable.

The results for both example #1 and this example are unpredictable because the containers within the same pod are not meant to share network resources–and in particular, host network ports. All the containers are identical, and each one attempts to send UDP traffic with the same ports and addresses on the loopback interface. Only the first succeeds; the ability of other containers to register and locate ROS objects all depends on which of the three containers starts first. This may change when the containers are restarted which leads to the unpredictable behavior.

Try inspecting each container manually. Begin by opening an interactive shell into a running container (container l1 in pod tl in this case):

microk8s.kubectl exec tl -c l1 -it -- /bin/bash

Within the container shell, source the Foxy setup file and echo the /microk8s_chatter topic:

source /opt/ros/foxy/setup.bash 
ros2 topic echo /microk8s_chatter

The command returns the output below showing that this container receives messages from the t2 pod:

data: 't2:tl:1: 68'
---
data: 't2:tl:1: 69'
---
…

Try this on all three containers within the pod, and notice that not all containers receive all the intended traffic. Some remedies suggest adding hostNetwork: true to the pod spec, while others enable multicast traffic on the host’s loopback interface. However, these settings still suffer from network traffic collisions on the loopback interface, so results remain unpredictable.

Kubernetes solves the general case of this problem by accessing workers through a Kubernetes service configuration. Unfortunately these services rely on network address translation which defeats ROS discovery, as discussed in this earlier post.

Example 3: Two talkers, one listener, three pods

This third example attempts to fix problems encountered in the previous example by placing each container in its own pod. Containers will communicate using each pod’s IP address rather than using the K8s node’s loopback interface. Since each pod has its own network interface, the listener container interface should receive messages from both talker nodes. Does the listener consistently receive traffic from both talkers?

apiVersion: v1
kind: Pod
metadata:
  name: t1
spec:
  containers:
  - name: t1 
    image: ros:foxy
    command: ["/bin/bash", "-c"]
    args: ["source /opt/ros/foxy/setup.bash && apt update && apt install -y curl && curl https://raw.githubusercontent.com/canonical/robotics-blog-k8s/main/publisher.py > publisher.py && /bin/python3 publisher.py t1"]
---
apiVersion: v1
kind: Pod
metadata:
  name: t2
spec:
  containers:
  - name: t2 
    image: ros:foxy
    command: ["/bin/bash", "-c"]
    args: ["source /opt/ros/foxy/setup.bash && apt update && apt install -y curl && curl https://raw.githubusercontent.com/canonical/robotics-blog-k8s/main/publisher.py > publisher.py && /bin/python3 publisher.py t2"]
---
apiVersion: v1
kind: Pod
metadata:
  name: l1
spec:
  containers:
  - name: l1 
    image: ros:foxy
    command: ["/bin/bash", "-c"]
    args: ["source /opt/ros/foxy/setup.bash && apt update && apt install -y curl && curl https://raw.githubusercontent.com/canonical/robotics-blog-k8s/main/subscriber.py > subscriber.py && /bin/python3 subscriber.py l1"]

With this configuration, the listener still usually does not see both talkers.

It’s time to take a closer look at the network traffic to understand why these pods behave as they do. Install tcpdump on a talker node and begin a trace of UDP port 7400 traffic:

microk8s.kubectl exec t1 -it -- apt install -y tcpdump
microk8s.kubectl exec t1 -it -- tcpdump -i any -n udp port 7400

This should return output similar to the following which shows the network traffic for ROS discovery attempts (the pod’s IP address is 10.1.224.76):

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
17:03:00.827365 IP 10.1.224.76.38494 > 239.255.0.1.7400: UDP, length 208
17:03:00.827484 IP 127.0.0.1.34505 > 239.255.0.1.7400: UDP, length 208
17:03:03.827554 IP 10.1.224.76.38494 > 239.255.0.1.7400: UDP, length 208
17:03:03.827674 IP 127.0.0.1.34505 > 239.255.0.1.7400: UDP, length 208
...

This RTPS discovery traffic uses UDP port 7400 (see the DDS Wire Protocol Specification, section 9.6). ROS 2 depends upon this traffic to discover and communicate with other nodes. The packet trace above shows discovery traffic on the host’s loopback interface (127.0.0.1) as well as the pod’s network interface (10.1.224.76), all destined for a local multicast IP address. Similarly, when executing the same commands on any other container, the results show multicast traffic for that container. No container receives multicast traffic from outside its pod.

Multicast traffic does not reach other pods because as of this writing it is not supported by Calico, the default networking plugin used by MicroK8s. Pods will not receive multicast traffic using Calico.

This challenge can be solved by shifting to a different Container Network Interface (CNI). Example 4 demonstrates a configuration using the Multus CNI with MicroK8s.

Example 4: Talker and listener pods connected with Multus

In order to properly handle ROS 2 traffic with MicroK8s, use the Multus container network interface (CNI) plugin manager. Example 4 again creates two talkers and one listener each in its own pod, but this time each pod is configured with an additional MacVLAN network interface. Does the listener now receive data from both talkers?

Before applying this configuration be sure to customize the ipam section to match your network. For more details on how this MacVLAN bridge is set up, see the section titled “Starting your ROS Cluster” in this earlier post.

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: my-network
spec:
  config: '{
    "cniVersion": "0.3.0",
    "name": "my-network",
    "type": "macvlan",
    "master": "eth0",
    "mode": "bridge",
    "isDefaultgateway": true,
    "ipam": {
      "type": "host-local",
      "ranges": [
         [ {
           "subnet": "192.168.0.0/16",
           "rangeStart": "192.168.1.160",
           "rangeEnd": "192.168.1.180",
           "gateway": "192.168.1.1"
         } ]
      ]
    }
  }'
---
apiVersion: v1
kind: Pod
metadata:
  name: t1
  annotations:
    k8s.v1.cni.cncf.io/networks: my-network
spec:
  containers:
  - name: t1 
    image: ros:foxy
    command: ["/bin/bash", "-c"]
    args: ["source /opt/ros/foxy/setup.bash && apt update && apt install -y curl && curl https://raw.githubusercontent.com/canonical/robotics-blog-k8s/main/publisher.py > publisher.py && /bin/python3 publisher.py t1"]
---
apiVersion: v1
kind: Pod
metadata:
  name: t2
  annotations:
    k8s.v1.cni.cncf.io/networks: my-network
spec:
  containers:
  - name: t2
    image: ros:foxy
    command: ["/bin/bash", "-c"]
    args: ["source /opt/ros/foxy/setup.bash && apt update && apt install -y curl && curl https://raw.githubusercontent.com/canonical/robotics-blog-k8s/main/publisher.py > publisher.py && /bin/python3 publisher.py t2"]
---
apiVersion: v1
kind: Pod
metadata:
  name: l1
  annotations:
    k8s.v1.cni.cncf.io/networks: my-network
spec:
  containers:
  - name: l1 
    image: ros:foxy
    command: ["/bin/bash", "-c"]
    args: ["source /opt/ros/foxy/setup.bash && apt update && apt install -y curl && curl https://raw.githubusercontent.com/canonical/robotics-blog-k8s/main/subscriber.py > subscriber.py && /bin/python3 subscriber.py l1"]

Follow the logs on the listener to see messages successfully being received from both talker t1and talker t2:

Using the same hints used in earlier examples, examine why the traffic now properly flows between containers. First use the command microk8s.kubectl exec t1 -- ifconfig to show the network interfaces for talker t1:

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1440
        inet 10.1.224.79  netmask 255.255.255.255  broadcast 0.0.0.0
        inet6 fe80::d8ca:7eff:fe4f:68e5  prefixlen 64  scopeid 0x20<link>
        ether da:ca:7e:4f:68:e5  txqueuelen 0  (Ethernet)
        RX packets 13454  bytes 18008033 (18.0 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 5715  bytes 450812 (450.8 KB)
        TX errors 0  dropped 1 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 53  bytes 14972 (14.9 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 53  bytes 14972 (14.9 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

net1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.163  netmask 255.255.0.0  broadcast 0.0.0.0
        inet6 fe80::7c0b:c3ff:fe5e:18f2  prefixlen 64  scopeid 0x20<link>
        ether 7e:0b:c3:5e:18:f2  txqueuelen 0  (Ethernet)
        RX packets 488  bytes 101314 (101.3 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 371  bytes 81458 (81.4 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Each container has the eth0 interface for typical pod-to-pod communications, and the lo loopback interface for container-to-container communications within a pod. However, with the additional NetworkAttachmentDefinition, annotated pods now also receive a net1 network interface with an IP address on the same network as the MicroK8s host.

Installing and running tcpdump within a container shows this host sending discovery traffic on all three interfaces. Traffic also shows this pod (192.168.1.163) receiving multicast traffic from talker t2 (192.168.1.176) and listener l1 (192.168.1.177) :

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
19:24:35.103528 IP 192.168.1.176.44714 > 239.255.0.1.7400: UDP, length 264
19:24:35.103817 IP 10.1.224.79.56542 > 239.255.0.1.7400: UDP, length 264
19:24:35.103954 IP 192.168.1.163.52253 > 239.255.0.1.7400: UDP, length 264
19:24:35.104032 IP 127.0.0.1.55317 > 239.255.0.1.7400: UDP, length 264
19:24:35.851282 IP 192.168.1.177.43468 > 239.255.0.1.7400: UDP, length 264
...

Since pod t1 receives multicast traffic from other pods, discovery succeeds and this pod can successfully communicate with others.

Example 5: Talker and listener pods with Multus take 2

Perhaps example 4 adds a bit more complexity than necessary. Example 5 tries one more time to put two ROS 2 containers inside the same pod. With discovery traffic flowing properly, perhaps containers will be able to discover and communicate with each other on the loopback interface. Will the listener hear both talkers when two talkers are in the same pod and the pod has a MacVLAN interface?

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: home-network
spec:
  config: '{
    "cniVersion": "0.3.0",
    "name": "home-network",
    "type": "macvlan",
    "master": "eth0",
    "mode": "bridge",
    "isDefaultgateway": true,
    "ipam": {
      "type": "host-local",
      "ranges": [
         [ {
           "subnet": "192.168.0.0/16",
           "rangeStart": "192.168.1.160",
           "rangeEnd": "192.168.1.180",
           "gateway": "192.168.1.1"
         } ]
      ]
    }
  }'
---
apiVersion: v1
kind: Pod
metadata:
  name: t
  annotations:
    k8s.v1.cni.cncf.io/networks: home-network
spec:
  containers:
  - name: t1 
    image: ros:foxy
    command: ["/bin/bash", "-c"]
    args: ["source /opt/ros/foxy/setup.bash && apt update && apt install -y curl && curl https://raw.githubusercontent.com/canonical/robotics-blog-k8s/main/publisher.py > publisher.py && /bin/python3 publisher.py t1"]
  - name: t2 
    image: ros:foxy
    command: ["/bin/bash", "-c"]
    args: ["source /opt/ros/foxy/setup.bash && apt update && apt install -y curl && curl https://raw.githubusercontent.com/canonical/robotics-blog-k8s/main/publisher.py > publisher.py && /bin/python3 publisher.py t2"]
---
apiVersion: v1
kind: Pod
metadata:
  name: l
  annotations:
    k8s.v1.cni.cncf.io/networks: home-network
spec:
  containers:
  - name: l1 
    image: ros:foxy
    command: ["/bin/bash", "-c"]
    args: ["source /opt/ros/foxy/setup.bash && apt update && apt install -y curl && curl https://raw.githubusercontent.com/canonical/robotics-blog-k8s/main/subscriber.py > subscriber.py && /bin/python3 subscriber.py l1"]

This configuration does not succeed, no surprise there. Using the experience from earlier examples, it’s easy to understand why.

Run a packet capture for all UDP traffic on either talker t1 or talker t2. Add the -X option to tcpdump to show detailed packet data, and search through the results to find the message string being sent on the microk8s_chatter topic.

The excerpt below from the command microk8s.kubectl exec t -c t1 -- tcpdump -i any -n -X udp does indeed show a messages from talker t1 to listener l1:

18:20:28.496650 IP 192.168.1.173.37722 > 192.168.1.174.7413: UDP, length 92
    0x0000:  4500 0078 4925 4000 4011 6ca4 c0a8 01ad  E..xI%@.@.l.....
    0x0010:  c0a8 01ae 935a 1cf5 0064 8521 5254 5053  .....Z...d.!RTPS
    0x0020:  0203 010f 010f 3eaf 0100 0000 0100 0000  ......>.........
    0x0030:  0e01 0c00 010f b9c2 0100 0000 0100 0000  ................
    0x0040:  0901 0800 6c9f a15f 2d24 0f7f 1505 2800  ....l.._-$....(.
    0x0050:  0000 1000 0000 1104 0000 1103 0000 0000  ................
    0x0060:  d100 0000 0001 0000 0c00 0000 7431 3a74  ............t1:t
    0x0070:  3a31 3a20 3230 3800                      :1:.208.

However, this packet trace shows no messages from talker t2. The same results come from running the capture on talker t2, even though the logs for talker t2 show that the second talker is still publishing messages.

Messages sent from talker ‘t2’ never actually make it off container t2 and onto the pod’s network interface.

In conclusion

This post covered many options for troubleshooting ROS 2 running on Kubernetes, and demonstrates the importance of properly setting up network communications for the ROS 2 containers.

This Multus / MacVLAN setup is not the only solution, but can serve as a beginning for you to build upon. Many other solutions exist. Within the Kubernetes configuration, Multus can be used to provision a different type of interface such as IPvlan, or Flannel can be used to create a virtual network. Within ROS 2, the communications middleware provider can be configured to use a specific interface avoiding discovery confusion. Hopefully this exploration of both loopback and multicast traffic helps you customize your ROS 2 + MicroK8s implementations to fit your environment.

This is the fourth article in a series of posts describing ROS 2 applications on Kubernetes with MicroK8s

Part 1: ROS 2 and Kubernetes basics
Part 2: ROS 2 on Kubernetes: a simple talker and listener setup
Part 3: Distribute ROS 2 across machines with Kubernetes
Part 4 (this article): Exploring ROS 2 Kubernetes configurations