Cilium mutual auth … DIY
--
Introduction
The idea of this short tutorial is to see if we can get Cilium mutual-auth working on a sef-managed cluster.
I used a 3-node cluster on AWS based on
- Ubuntu 20.04
- Containerd 1.6.21
- Kubernetes v1.27.4
Install instructions are based on https://github.com/xxradar/k8s-calico-oss-install-containerd, but do not install any CNI at this point.
Install Cilium components
This is just a quick install, check out https://docs.cilium.io/en/v1.14/ for up-to-date install instructions.
Cilium CLI
Install the cilium cli
CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
CLI_ARCH=amd64
if [ "$(uname -m)" = "aarch64" ]; then CLI_ARCH=arm64; fi
curl -L --fail --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}
sha256sum --check cilium-linux-${CLI_ARCH}.tar.gz.sha256sum
sudo tar xzvfC cilium-linux-${CLI_ARCH}.tar.gz /usr/local/bin
rm cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}
Check for version 0.15 or higher
cilium version --client
cilium-cli: v0.15.3 compiled with go1.20.4 on linux/amd64
Cilium CNI
Install the Cilium CNI
sudo snap install helm --classic
helm repo add cilium https://helm.cilium.io/
helm install cilium cilium/cilium --version 1.14.0 \
--namespace kube-system \
--set authentication.mutual.spire.enabled=true \
--set authentication.mutual.spire.install.enabled=true \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true
cilium status --wait
Verify SPIRE components
Mutual-authentication in cilium is based on the https://spiffe.io/. (Secure Production Identity Framework for Everyone). Check out the link for all details.
SPIRE is a production-ready implementation of the SPIFFE APIs that performs node and workload attestation in order to securely issue SVIDs to workloads, and verify the SVIDs of other workloads, based on a predefined set of conditions. An SVID is the document with which a workload proves its identity to a resource. (ex. a certificate).
Cilium agents request SVIDs on behalf of the pods via the spire-agents the moment the pods / Cilium Endpoints are created.
Check if the spire-server
and spire-agents
are up and running (probably they are not on a self-managed cluster)
ubuntu@ip-10-1-2-162:~$ kubectl get all -n cilium-spire
NAME READY STATUS RESTARTS AGE
pod/spire-agent-fkck9 0/1 Init:0/1 0 37s
pod/spire-agent-ltlnc 0/1 Init:0/1 0 41s
pod/spire-server-0 0/2 Pending 0 75s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/spire-server ClusterIP 10.104.3.112 <none> 8081/TCP 75s
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/spire-agent 2 2 0 2 0 <none> 75s
NAME READY AGE
statefulset.apps/spire-server 0/1 75s
The spire-server
might remain in pending state, because it requires a PersistentVolume
to boot correctly. In case it does, create a pv
.
kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolume
metadata:
name: spire-pv
spec:
capacity:
storage: 1Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: ""
hostPath:
path: /mnt/data # Replace this with the desired path on the host filesystem
EOF
kubectl -n kube-system rollout restart deployment/cilium-operator
kubectl -n kube-system rollout restart ds/cilium
ubuntu@ip-10-1-2-162:~$ kubectl get all -n cilium-spire
NAME READY STATUS RESTARTS AGE
pod/spire-agent-fkck9 1/1 Running 0 2m47s
pod/spire-agent-ltlnc 1/1 Running 0 2m51s
pod/spire-server-0 2/2 Running 0 3m25scilium config view | grep mesh-auth
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/spire-server ClusterIP 10.104.3.112 <none> 8081/TCP 3m25s
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/spire-agent 1 1 1 1 1 <none> 3m25s
NAME READY AGE
statefulset.apps/spire-server 1/1 3m25s
Install Hubble CLI
export HUBBLE_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/hubble/master/stable.txt)
HUBBLE_ARCH=amd64
if [ "$(uname -m)" = "aarch64" ]; then HUBBLE_ARCH=arm64; fi
curl -L --fail --remote-name-all https://github.com/cilium/hubble/releases/download/$HUBBLE_VERSION/hubble-linux-${HUBBLE_ARCH}.tar.gz{,.sha256sum}
sha256sum --check hubble-linux-${HUBBLE_ARCH}.tar.gz.sha256sum
sudo tar xzvfC hubble-linux-${HUBBLE_ARCH}.tar.gz /usr/local/bin
rm hubble-linux-${HUBBLE_ARCH}.tar.gz{,.sha256sum}
hubble version
cilium hubble port-forward &
hubble status
Install a demo application
Install a custom app. I typically use app-routable-demo, a configurable mesh of proxies that simulate a microservices app. https://github.com/xxradar/app_routable_demo
git clone https://github.com/xxradar/app_routable_demo.git
cd ./app_routable_demo
./setup.sh
watch kubectl get po -n app-routable-demo
The moment the pods are running, the cilium agent will request an SVID for each (group of) pods. You can find the IDENTITY ID
for a specific pod
kubectl get cep -n app-routable-demo
NAME ENDPOINT ID IDENTITY ID INGRESS ENFORCEMENT EGRESS ENFORCEMENT VISIBILITY POLICY ENDPOINT STATE IPV4 IPV6
echoserver-1-deployment-77c7b97758-844ct 1624 9164 <status disabled> <status disabled> <status disabled> ready 10.0.1.29
echoserver-1-deployment-77c7b97758-9kz8l 139 9164 <status disabled> <status disabled> <status disabled> ready 10.0.2.3
echoserver-1-deployment-77c7b97758-xwk5z 992 9164 <status disabled> <status disabled> <status disabled> ready 10.0.2.107
echoserver-2-deployment-74658fd96d-gdkhk 668 10485 <status disabled> <status disabled> <status disabled> ready 10.0.1.22
echoserver-2-deployment-74658fd96d-jjnpj 1071 10485 <status disabled> <status disabled> <status disabled> ready 10.0.2.208
echoserver-2-deployment-74658fd96d-w96cb 40 10485 <status disabled> <status disabled> <status disabled> ready 10.0.2.176
mycurler 635 33780 <status disabled> <status disabled> <status disabled> ready 10.0.2.100
nginx-zone1-6d57c556f8-rbf9w 720 13742 <status disabled> <status disabled> <status disabled> ready 10.0.1.175
nginx-zone2-fcf79f559-2jqm7 3409 5799 <status disabled> <status disabled> <status disabled> ready 10.0.2.225
nginx-zone3-8c78d5dbd-pvc8f 202 2857 <status disabled> <status disabled> <status disabled> ready 10.0.1.163
nginx-zone4-747cd49bfc-9ft9x 1848 7678 <status disabled> <status disabled> <status disabled> ready 10.0.2.125
nginx-zone5-7987976dc8-m87h9 1418 9866 <status disabled> <status disabled> <status disabled> ready 10.0.2.232
siege-deployment-6f8567f7fc-9qhcc 729 3694 <status disabled> <status disabled> <status disabled> ready 10.0.2.187
siege-deployment-6f8567f7fc-pt48n 1849 3694 <status disabled> <status disabled> <status disabled> ready 10.0.1.139
siege-deployment-6f8567f7fc-vtgxg 393 3694 <status disabled> <status disabled> <status disabled> ready 10.0.1.50
and find the corresponding SPIFFE ID and entry (ex nginx-zone1–6d57c556f8-rbf9w 720 13742)
kubectl exec -n cilium-spire spire-server-0 -c spire-server -- /opt/spire/bin/spire-server entry show -selector cilium:mutual-auth
Found 15 entries
...
Entry ID : 95321d21-c6ca-40a8-b5ad-87c99c6f09bf
SPIFFE ID : spiffe://spiffe.cilium/identity/13742
Parent ID : spiffe://spiffe.cilium/cilium-operator
Revision : 0
X509-SVID TTL : default
JWT-SVID TTL : default
Selector : cilium:mutual-auth
Entry ID : 84a90f8b-5bfe-4b52-85ef-48a93edb71c2
SPIFFE ID : spiffe://spiffe.cilium/identity/15527
Parent ID : spiffe://spiffe.cilium/cilium-operator
Revision : 0
X509-SVID TTL : default
JWT-SVID TTL : default
Selector : cilium:mutual-auth
...
Apply a L7 Cilium Network Policy w/o mutual auth
kubectl apply -f - <<EOF
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: no-mutual-auth-echo-app-routeble-demo
namespace: app-routable-demo
spec:
endpointSelector:
matchLabels:
app: nginx-zone1
ingress:
- fromEndpoints:
- matchLabels:
app: siege
authentication:
mode: "required"
toPorts:
- ports:
- port: "80"
protocol: TCP
rules:
http:
- method: "GET"
path: "/app1"
EOF
We can observe the traffic
hubble observe -n app-routable-demo -l app=siege -f
...
Traffic to app1
is allowed in this example, traffic to app2
, app3
and app4
is dropped. You can change the policy to if you prefer. It shows to power of this L7 policy functionality.
path: "/app.*"
Apply mutual auth …
kubectl apply -f - <<EOF
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: no-mutual-auth-echo-app-routeble-demo
namespace: app-routable-demo
spec:
endpointSelector:
matchLabels:
app: nginx-zone1
ingress:
- fromEndpoints:
- matchLabels:
app: siege
authentication:
mode: "required"
toPorts:
- ports:
- port: "80"
protocol: TCP
rules:
http:
- method: "GET"
path: "/app1"
EOF
hubble observe -n app-routable-demo -l app=siege -f
Aug 4 08:01:56.733: app-routable-demo/siege-deployment-6f8567f7fc-9qhcc:38942 (ID:3694) -> kube-system/coredns-5d78c9869d-szjq5:53 (ID:7956) to-endpoint FORWARDED (UDP)
Aug 4 08:01:56.733: app-routable-demo/siege-deployment-6f8567f7fc-9qhcc:38942 (ID:3694) <- kube-system/coredns-5d78c9869d-szjq5:53 (ID:7956) to-endpoint FORWARDED (UDP)
Aug 4 08:01:56.733: app-routable-demo/siege-deployment-6f8567f7fc-9qhcc:53060 (ID:3694) -> app-routable-demo/nginx-zone1-6d57c556f8-rbf9w:80 (ID:13742) to-overlay FORWARDED (TCP Flags: SYN)
Aug 4 08:01:56.733: app-routable-demo/siege-deployment-6f8567f7fc-9qhcc:53060 (ID:3694) -> app-routable-demo/nginx-zone1-6d57c556f8-rbf9w:80 (ID:13742) policy-verdict:L3-L4 INGRESS ALLOWED (TCP Flags: SYN; Auth: SPIRE)
Aug 4 08:01:56.733: app-routable-demo/nginx-zone1-6d57c556f8-rbf9w:80 (ID:13742) <> app-routable-demo/siege-deployment-6f8567f7fc-9qhcc:53060 (ID:3694) to-overlay FORWARDED (TCP Flags: SYN, ACK)
Aug 4 08:01:56.734: app-routable-demo/siege-deployment-6f8567f7fc-9qhcc:53060 (ID:3694) <- app-routable-demo/nginx-zone1-6d57c556f8-rbf9w:80 (ID:13742) to-endpoint FORWARDED (TCP Flags: SYN, ACK)
Aug 4 08:01:56.734: app-routable-demo/siege-deployment-6f8567f7fc-9qhcc:53060 (ID:3694) -> app-routable-demo/nginx-zone1-6d57c556f8-rbf9w:80 (ID:13742) to-overlay FORWARDED (TCP Flags: ACK, PSH)
Aug 4 08:01:56.734: app-routable-demo/siege-deployment-6f8567f7fc-9qhcc:53060 (ID:3694) -> app-routable-demo/nginx-zone1-6d57c556f8-rbf9w:80 (ID:13742) to-overlay FORWARDED (TCP Flags: ACK)
Aug 4 08:01:56.734: app-routable-demo/nginx-zone1-6d57c556f8-rbf9w:80 (ID:13742) <> app-routable-demo/siege-deployment-6f8567f7fc-9qhcc:53060 (ID:3694) to-overlay FORWARDED (TCP Flags: ACK)
Aug 4 08:01:56.734: app-routable-demo/siege-deployment-6f8567f7fc-9qhcc:53060 (ID:3694) -> app-routable-demo/nginx-zone1-6d57c556f8-rbf9w:80 (ID:13742) http-request FORWARDED (HTTP/1.1 GET http://zone1/app2)
Aug 4 08:01:56.738: app-routable-demo/nginx-zone1-6d57c556f8-rbf9w:80 (ID:13742) <> app-routable-demo/siege-deployment-6f8567f7fc-9qhcc:53060 (ID:3694) to-overlay FORWARDED (TCP Flags: ACK, PSH)
Aug 4 08:01:56.738: app-routable-demo/nginx-zone1-6d57c556f8-rbf9w:80 (ID:13742) <> app-routable-demo/siege-deployment-6f8567f7fc-9qhcc:53060 (ID:3694) to-overlay FORWARDED (TCP Flags: ACK, FIN)
Aug 4 08:01:56.738: app-routable-demo/siege-deployment-6f8567f7fc-9qhcc:53060 (ID:3694) <- app-routable-demo/nginx-zone1-6d57c556f8-rbf9w:80 (ID:13742) http-response FORWARDED (HTTP/1.1 200 4ms (GET http://zone1/app2))
Aug 4 08:01:56.739: app-routable-demo/siege-deployment-6f8567f7fc-9qhcc:53060 (ID:3694) <- app-routable-demo/nginx-zone1-6d57c556f8-rbf9w:80 (ID:13742) to-endpoint FORWARDED (TCP Flags: ACK, PSH)
Aug 4 08:01:56.739: app-routable-demo/siege-deployment-6f8567f7fc-9qhcc:53060 (ID:3694) <- app-routable-demo/nginx-zone1-6d57c556f8-rbf9w:80 (ID:13742) to-endpoint FORWARDED (TCP Flags: ACK, FIN)
Note the line:
Aug 4 08:01:56.733: app-routable-demo/siege-deployment-6f8567f7fc-9qhcc:53060 (ID:3694) -> app-routable-demo/nginx-zone1–6d57c556f8-rbf9w:80 (ID:13742) policy-verdict:L3-L4 INGRESS ALLOWED (TCP Flags: SYN; Auth: SPIRE)
Conclusion
This was a very quick run-through. I hope the tutorial sheds some additional light on the very complex topic. In some shape or form this will become an very important concept in future networking.