# Design for Second Iteration of Cluster/Homelab ## Context Current cluster was set up just to run CI builds as a trial. I'm now sold the k8s is a good approach and would like to move more of my services to it. This document will track my design for cluster v2. ## Investigation ### Host OS Debian: - on laptop - already on most of systems - stable - not officially tested by k3s - Will be using apt at work Stream: - Tried with k3s and had to disable systemd... - On second try seemed to work even with error I saw before. - Cockpit is nice when managing servers. - Want to like RHEL - More stable than Fedora - RPMs are easier to work with - Using on VM host Fedora: - Want to like RHEL - Tested with k3s - Latest podman and frieds - Really fast for something stable... - Cockpit is nice - Fedora minimal can't be installed on cockpit without hitting tab alot. Decision: Fedora Server ### K3S Distro RKE2: - no Debian support - 4GB Minimum - 2 CPU - cilium and nginx not default k3s: - k3d is a thing - documentation online is good - 512 MB of RAM - 1 CPU - easy installation Decision: k3s ### How many clusters? Decision: Exactly two (one for "need to work" services one for CI and messing around). The mess with longhorn scared me... it wouldn't be that big a deal if it only effected CI, but it also effect Kanboard and git. ### Files Decision: Host local. Files are not something I want to have to think about. The longhorn mess scared me. NFS not working with postgres is annoying. ### How many nodes per cluster? The current cluster has lots of small VMs, with VMs added with more CPUs/RAM as the requirements grew. I'd rather limit myself to fewer more powerful VMs, and let the VM OS manage CPU and memory. More nodes would be useful if they were on different base hardware. Realistically I'm never going to pay for more than the Ingress VM... Decision: 1 big VM per cluster. Both VMs hosted on current hardware. If we add hardware, can add an additional node at that time. ### Networking Status quo is flannel with vxlan with Traefik and Klipper LB and CoreDNS. #### DNS CoreDNS is great. #### Load Balancer Klipper works fine now. MetalLB is the other option, is more complicated and doesn't seem to give much particularly with a single node cluster. Decision: Klipper #### Ingress Traefik: - Status Quo. - Works fine. - Outside of k8s I don't like. nginx-ingress: - Google - Used by a lot of people. - Nothing sexy or risky. - auth exposed in annotations ingress-nginx: - nginx upstream. - extra features like stream support that I'm using on lightsail now. - full blown virtual server support. - maybe too complicated? - exposes same features as I have on lightsail through annotations, which could be a thing to get keycloack to work. - auth in annotations is behind paywall, but available through a virtual server Decision: nginx-ingress Use LB for stuff I would use the virtual server for. #### CNI flannel vxlan - status quo - works fine cilium - label based network policies - leaning toward this plus multus though I doubt I'll ever write a policy - I want the ability to write a policy... - if set up different pod cidr can do multi-cluster later - cluster name and cluster id at install time - can do transparent encryption (not worth it...) cilium multi-cluster networking: - not worth the complexity - will manage connections with ingress/egress methods. flannel wg - encrypt traffic and set up overlay if I want to interact with cloud machines - can do the same with a manual wireguard network... istio - I dislike side car containers - Traffic I'm interested in is mainly not L7. - blessed by Air Force multus: - tried on fedora and didn't get very far I think because of something with k3s. Decision: cilium want network policies and hubble observability is a risk, but this is supposed to be a learning experience. ## What goes on each cluster/VM? Lightsail: 1. Wireguard 2. Apt/RPM repos 3. Main NGINX Proxy Infra Cluster: - On Host: 1. CoreDNS 2. Wireguard - On Cluster: 1. Keycloak 2. Kanboard 3. OneDev 4. Harbor Main Cluster: - On Host: 1. Wireguard - On Cluster: 1. Tekton 2. MQTT Broker 3. Squid 4. j7s-os-deployment ## Deployments Manually kubectl apply: - Easy to reason about - running apply is fun - using flux has chicken and egg problem if git is also deployed from flux Flux: - More git ops-y - chicken and egg problem is conquerable, in a maybe confusing way Decision: 1. Infra: 1. kubectl apply/helm everything 2. Drop keycloak image manually in k3s either using cri or placing in magic place after k3s install. 3. Use helm with values for onedev. 4. Get rid of Kanboard custom image. Use kubectl apply. 2. Test: 1. Mostly kubectl apply for tekton. 2. Use flux for: 1. MQTT 2. j7s-os-deploy 3. squid ## VM Resources Lightsail: - Leave alone Infra Cluster: - RAM 4 GiB total - 2 CPUs - 120Gib Hardrive Main Cluster: - RAM 4 GiB total - 2 CPUs - 120Gib Hardrive ## Secrets Options: Mozilla Kops Bitnami Sealed Secrets Both work with Flux. Sealed Secrets seems more integrated with k8s when not using Flux. Decision: Bitnami Sealed Secrets ## Experiments ### k3s with cilium and nginx on Centos Stream 9 ``` systemctl disable firewalld --now export INSTALL_K3S_EXEC="server --disable traefik --flannel-backend=none --disable-network-policy --selinux" curl -sfL https://get.k3s.io | sh -s - ``` I see an error about selinux policies conflicting, but I'm not sure if it matters? Install cilium following instructions here: https://docs.cilium.io/en/v1.12/gettingstarted/k3s/ Install nginx with: ``` helm upgrade --install ingress-nginx ingress-nginx \ --repo https://kubernetes.github.io/ingress-nginx \ --namespace ingress-nginx --create-namespace \ --set controller.ingressClassResource.default=true ``` ### k3s with nginx on fedora server ``` sudo systemctl disable firewalld --now export INSTALL_K3S_EXEC="server --disable traefik --selinux" curl -sfL https://get.k3s.io | sh -s - sudo chown jimmy:jimmy /etc/rancher/k3s/k3s.yaml sudo dnf install helm export KUBECONFIG=/etc/rancher/k3s/k3s.yaml helm upgrade --install ingress-nginx ingress-nginx --repo https://kubernetes.github.io/ingress-nginx --namespace ingress-nginx --create-namespace ``` Import simple-ros2. Laptop: ``` podman save -o simple-ros2.tar simple-ros2:latest scp simple-ros2.tar 192.168.1.106:~/. ``` On server: ``` sudo ctr images import ./simple-ros2.tar # wait forever.... ``` Test yaml: ``` --- apiVersion: v1 kind: Pod metadata: name: test-pod spec: containers: - name: simple-ros2 image: localhost/simple-ros2:latest imagePullPolicy: Never args: [ros2, launch, j7s-simple, j7s_publisher_launch.py] ``` ### VM Host set up I **think** I ran something like this when I set up the VM host. I don't remember exactly, and I didn't document it... This should be carefully looked at before running. ``` nmcli connection add ifname br0 type bridge con-name br0 connection.zone trusted nmcli connection add type bridge-slave ifname enp4s0 master br0 nmcli connection modify br0 bridge.stp no nmcli connection modify enp4s0 autoconnect no nmcli connection down enp4s0 nmcli connection up id br0 ``` ### Kubeseal Use ``` apiVersion: v1 kind: Secret metadata: name: test-secret namespace: my-namespace type: Opaque data: username: dmFsdWUtMQ0K password: dmFsdWUtMg0KDQo= stringData: hostname: myapp.mydomain.com ``` cat secret.yaml | kubeseal --format yaml > sealedsecret.yaml # Actual Install Notes ## To Do List Infra Cluster: [x] - On Host: 1. CoreDNS [x] 2. Wireguard [x] - On Cluster: 1. Keycloak [x] 2. Kanboard [x] 3. Gitea [x] 4. Harbor [x] Main Cluster: - On Host: 1. Wireguard [x] - On Cluster: 1. Tekton Base install [ ] Add namespace Push images Update tasks Update jobs 5. Flux 1. MQTT Broker 2. Squid 3. j7s-os-deployment [x] Give accounts on Harbor to clusters. [ ] Push images to Harbor. [x] Hubble. ## Regularly Scheduled Programming Fedora Server 37 keep defaults. Infra: On VM: ``` sudo hostnamectl set-hostname infra-cluster sudo systemctl disable firewalld --now sudo su export INSTALL_K3S_EXEC="server --disable traefik --flannel-backend=none --disable-network-policy --cluster-cidr 10.44.0.0/16 --service-cidr 10.45.0.0/16 --cluster-dns 10.45.0.10 --selinux" curl -sfL https://get.k3s.io | sh -s - exit sudo cp /etc/rancher/k3s/k3s.yaml ~/infra.yaml sudo chown jimmy:jimmy ~/infra.yaml exit ``` on laptop ``` scp jimmy@192.168.1.112:~/infra.yaml /home/jimmy/.kube/. export KUBECONFIG=~/.kube/infra.yaml vim KUBECONFIG and fix ip. ``` Install cilium cli. On laptop: ``` cilium install ``` wait... ``` helm upgrade --debug --install ingress-nginx ingress-nginx --repo https://kubernetes.github.io/ingress-nginx --namespace ingress-nginx --create-namespace ``` Main: On VM: ``` sudo hostnamectl set-hostname j7s-cluster sudo systemctl disable firewalld --now sudo su export INSTALL_K3S_EXEC="server --disable traefik --flannel-backend=none --disable-network-policy --cluster-cidr 10.46.0.0/16 --service-cidr 10.47.0.0/16 --cluster-dns 10.47.0.10 --selinux --resolv-conf /run/systemd/resolve/resolv.conf" curl -sfL https://get.k3s.io | sh -s - exit sudo cp /etc/rancher/k3s/k3s.yaml ~/j7s-cluster.yaml sudo chown jimmy:jimmy ~/j7s-cluster.yaml exit ``` on laptop ``` scp jimmy@192.168.1.103:~/j7s-cluster.yaml /home/jimmy/.kube/. export KUBECONFIG=~/.kube/j7s-cluster.yaml vim KUBECONFIG and fix ip. ``` On laptop: ``` cilium install ``` wait... ``` helm upgrade --debug --install ingress-nginx ingress-nginx --repo https://kubernetes.github.io/ingress-nginx --namespace ingress-nginx --create-namespace ``` Install Sealed Secrets: Main: ``` export KUBECONFIG=~/.kube/j7s-cluster.yaml wget https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.19.5/controller.yaml kubectl apply -f controller.yaml ``` Infra: ``` export KUBECONFIG=~/.kube/infra.yaml kubectl apply -f controller.yaml rm controller.yaml ``` Install kubeseal. Merge kube config files: 1. Manually modify each config file and get rid of all the defaults to something unique for that file. ( I have k3s for the original cluster, j7s for the new main cluster, and infra for the new infra cluster. ) 2. Do some magic. ``` cp config.yaml config.yaml.back. export KUBECONFIG=~/.kube/config:~/.kube/infra.yaml:~/.kube/j7s-cluster.yaml kubectl config view --flatten > new-config mv new-confg config export KUBECONFIG=~/.kube/config chmod 600 ~/.kube/config ``` Use kubeseal to encrypt secrets for harbor. Install harbor. ``` cd infra-cluster/harbor kubectl apply -f namespace kubectl apply -f secrets cd helm ./install.bash ``` Build coredns rpm following instructions in coredns folder. scp to infra: ``` scp redhat/RPMS/x86_64/coredns-1.8.4-1.fc37.x86_64.rpm jimmy@192.168.1.112:~/. ssh jimmy@192.168.1.112 sudo dnf install ./coredns-1.8.4-1.fc37.x86_64.rpm exit ``` Copy over corefile from coredns folder. ``` scp Corefile jimmy@192.168.1.112:~/. ssh jimmy@192.168.1.112 sudo cp Corefile /etc/coredns/Corefile sudo systemctl start coredns sudo systemctl enable coredns sudo dnf install policycoreutils-devel rpm-build sepolicy generate --application /bin/coredns ./coredns.sh # Until it works.... sudo su ausearch -c '(coredns)' --raw | audit2allow -M my-coredns semodule -i my-coredns.pp # Also: sudo setsebool -P domain_can_mmap_files 1 # Turn of resolver. sudo vim /etc/systemd/resolved.conf DNSStubListener=no ``` Wound up turning off SELinux... ``` sudo vi /etc/selinux/config # SELINUX=permissive sudo grubby --update-kernel ALL --args selinux=0 ``` Wound up reverting back. Add: ``` CapabilityBoundingSet=CAP_NET_BIND_SERVICE AmbientCapabilities=CAP_NET_BIND_SERVICE ``` under `[Service]` in ``` sudo vim /usr/lib/systemd/system/coredns.service ``` Wireguard: ``` sudo dnf install wireguard-tools wg genkey | tee wg.key | wg pubkey > wg.pub vim wg0.conf <<< [Interface] Address = 10.100.100.?/24 PrivateKey = [Peer] PublicKey = zgcRWY3MAwKGokyRs9dR4E5smoeFy1Hh4MfDcDM3iSc= AllowedIPs = 10.100.100.0/24 Endpoint = vpn.jpace121.net:51902 PersistentKeepAlive = 25 <<<< ``` Add to server: ``` # Infra k3s node [Peer] PublicKey = <> AllowedIPs = 10.100.100.7/32 # Add to systemd sudo systemctl enable wg-quick@wg0.service sudo systemctl daemon-reload sudo systemctl start wg-quick@wg0 ``` Tried using nm below, moved to wg-quick for consistency. ``` nmcli con import type wireguard file /etc/wireguard/wg0.conf ``` Better: ``` sudo cp wg0.conf /etc/wireguard/wg0.conf sudo chown root:root /etc/wireguard/wg0.conf wg-quick up wg0 ``` Harbor Login: ``` scp harbor_tls.crt jimmy@10.100.100.7:. ssh jimmy@10.100.100.7 sudo cp harbor_tls.crt /etc/rancher/k3s/. ``` `/etc/rancher/k3s/registries.yaml` ``` configs: "harbor.internal.jpace121.net": auth: username: robot$k8s+infra-cluster password: tls: ca_file: /etc/rancher/k3s/harbor_tls.crt ``` Kanboard: Get PV Name: ``` kubectl describe pvc kanboard-pvc --context k3s ``` Use PV name to locate directory: ``` kubectl describe pv pvc-89a4265c-b39c-4628-9e6b-df091fae4fd8 --context k3s ``` Can tell on `k3s-node1` at `/var/lib/rancher/k3s/storage/pvc-89a4265c-b39c-4628-9e6b-df091fae4fd8_default_kanboard-pvc` ``` ssh jimmy@192.168.1.135 sudo su cd /var/lib/rancher/k3s/storage/pvc-89a4265c-b39c-4628-9e6b-df091fae4fd8_default_kanboard-pvc tar cvpzf /home/jimmy/kanboard-pvc.tar.gz . exit cd ~ sudo chown jimmy:jimmy kanboard-pvc.tar.gz exit scp jimmy@192.168.1.135:~/kanboard-pvc.tar.gz /tmp/kanboard-pvc.tar.gz ``` Apply PVC. Want: `volumeBindingMode: Immediate` ``` kubectl apply manifests --context infra kubectl describe pvc kanboard-pvc --context infra --namespace kanboard kubectl describe pv pvc-fe710c38-52ce-495b-bb8d-bea48222a21b --namespace kanboard ``` ``` scp /tmp/kanboard-pvc.tar.gz jimmy@192.168.1.112:. ssh jimmy@192.168.1.112 sudo su chown root:root ./kanboard-pvc.tar.gz cd /var/lib/rancher/k3s/storage/pvc-fe710c38-52ce-495b-bb8d-bea48222a21b_kanboard_kanboard-pvc rm -rf * tar xpvzf /home/jimmy/kanboard-pvc.tar.gz exit exit kubectl apply -f manifests/ ``` Make secret: ``` cat kanboard-cookie.yaml | kubeseal --format yaml > kanboard-cookie-sealed.yaml ``` Where should I proxy to? ``` kubectl -n ingress-nginx get svc ngress-nginx-controller LoadBalancer 10.45.94.103 192.168.1.112 80:31566/TCP,443:32594/TCP 23d ``` > 10.100.100.7:31566 ### Tekton kubectl apply --filename https://storage.googleapis.com/tekton-releases/pipeline/latest/release.yaml kubectl apply --filename https://storage.googleapis.com/tekton-releases/triggers/latest/release.yaml kubectl apply --filename https://storage.googleapis.com/tekton-releases/triggers/latest/interceptors.yaml kubectl apply --filename https://storage.googleapis.com/tekton-releases/dashboard/latest/release.yaml ### Keycloak kubectl describe pv pvc-4bcbb023-e686-4082-855f-d062ff418c74 --namespace keycloak `/var/lib/rancher/k3s/storage/pvc-4bcbb023-e686-4082-855f-d062ff418c74_keycloak_keycloak-db-pvc` `scp /tmp/db-backup.tar.gz jimmy@192.168.1.112:.` ``` sudo su chown root:root ./db-backup.tar.gz cd /var/lib/rancher/k3s/storage/pvc-4bcbb023-e686-4082-855f-d062ff418c74_keycloak_keycloak-db-pvc rm -rf * tar xpvzf /home/jimmy/db-backup.tar.gz sudo chown -R systemd-oom:systemd-oom * ```