Why consider a service mesh?
I’ve always been very intrigued by the service mesh concept. It can offer quite some intriguing features, but often they are perceived as a nice to have, rather then a must have if you’re just experimenting with kubernetes. While companies are adopting kubernetes at a very high pace to run their application workloads, things will become very challenging, probably sooner than you expect.
Considered that kubernetes is a distributed cluster sharing and managing resources across probably hundreds or even thousands of workloads at some point, one might wonder how these clusters will behave under error conditions or heavy load.
Understanding traffic flows in kubernetes is quite challenging.
First, clusters are relying on a CNI (Container Network Interface). Examples are Cilium, Calico, Weave Net, Azure CNI, AWS VPC CNI, etc.
They all use different networking techniques (encapsulation, routing, vxlan …) to implement the networking component in kubernetes. You might want to tune your CNI (ex. TCP MTU) to improve performance and reduce latency, but believe me, this is pretty hard to do and it depends greatly on the CNI and features in use. On top, kubernetes relies heavily on iptables (or ipchains) introducing even more latency. Getting an accurate view on performance is often based on raw optimized throughput and load generators. Real life traffic conditions are always hard to mimic, and this is not even taking into account possible error conditions or failures.
How to monitor traffic flows and latency?
Since my early days in networking, I always was wondering how to capture traffic on hubs, switches, etc … and recently also kubernetes.
While there are some techniques to make packet captures on a cluster using tcpdump for example, it requires a lot of resources on the cluster and expert analysis. This is great for troubleshooting, but not ideal for analyzing latency or understanding flows.
A great newcomer is eBPF. Available on recent Linux distributions and it can help bigtime in getting some insights in how your clusters are behaving. Cilium CNI is based on eBPF and has great performance and removes the need to heavily rely on iptables. The HublleUI component will visualize how your cluster is behaving and improve flow logging. This especially handy when…