679 lines
15 KiB
Markdown
679 lines
15 KiB
Markdown
# Design for Second Iteration of Cluster/Homelab
|
|
|
|
## Context
|
|
|
|
Current cluster was set up just to run CI builds as a
|
|
trial.
|
|
|
|
I'm now sold the k8s is a good approach and would like
|
|
to move more of my services to it.
|
|
|
|
This document will track my design for cluster v2.
|
|
|
|
|
|
## Investigation
|
|
|
|
### Host OS
|
|
|
|
Debian:
|
|
- on laptop
|
|
- already on most of systems
|
|
- stable
|
|
- not officially tested by k3s
|
|
- Will be using apt at work
|
|
|
|
Stream:
|
|
- Tried with k3s and had to disable systemd...
|
|
- On second try seemed to work even with error I saw before.
|
|
- Cockpit is nice when managing servers.
|
|
- Want to like RHEL
|
|
- More stable than Fedora
|
|
- RPMs are easier to work with
|
|
- Using on VM host
|
|
|
|
Fedora:
|
|
- Want to like RHEL
|
|
- Tested with k3s
|
|
- Latest podman and frieds
|
|
- Really fast for something stable...
|
|
- Cockpit is nice
|
|
- Fedora minimal can't be installed on
|
|
cockpit without hitting tab alot.
|
|
|
|
Decision: Fedora Server
|
|
|
|
|
|
### K3S Distro
|
|
|
|
RKE2:
|
|
- no Debian support
|
|
- 4GB Minimum
|
|
- 2 CPU
|
|
- cilium and nginx not default
|
|
|
|
k3s:
|
|
- k3d is a thing
|
|
- documentation online is good
|
|
- 512 MB of RAM
|
|
- 1 CPU
|
|
- easy installation
|
|
|
|
Decision: k3s
|
|
|
|
### How many clusters?
|
|
|
|
Decision: Exactly two (one for "need to work" services one for CI and messing around).
|
|
The mess with longhorn scared me... it wouldn't be that big a deal if it only effected
|
|
CI, but it also effect Kanboard and git.
|
|
|
|
### Files
|
|
|
|
Decision: Host local.
|
|
|
|
Files are not something I want to have to think about.
|
|
The longhorn mess scared me.
|
|
NFS not working with postgres is annoying.
|
|
|
|
### How many nodes per cluster?
|
|
|
|
The current cluster has lots of small VMs, with VMs added with more
|
|
CPUs/RAM as the requirements grew.
|
|
|
|
I'd rather limit myself to fewer more powerful VMs, and let the VM OS manage
|
|
CPU and memory.
|
|
|
|
More nodes would be useful if they were on different base hardware.
|
|
Realistically I'm never going to pay for more than the Ingress VM...
|
|
|
|
Decision:
|
|
1 big VM per cluster.
|
|
Both VMs hosted on current hardware.
|
|
If we add hardware, can add an additional node at that time.
|
|
|
|
### Networking
|
|
|
|
Status quo is flannel with vxlan with Traefik and Klipper LB and CoreDNS.
|
|
|
|
#### DNS
|
|
|
|
CoreDNS is great.
|
|
|
|
#### Load Balancer
|
|
|
|
Klipper works fine now.
|
|
MetalLB is the other option, is more complicated and doesn't
|
|
seem to give much particularly with a single node cluster.
|
|
|
|
Decision: Klipper
|
|
|
|
#### Ingress
|
|
|
|
Traefik:
|
|
- Status Quo.
|
|
- Works fine.
|
|
- Outside of k8s I don't like.
|
|
|
|
nginx-ingress:
|
|
- Google
|
|
- Used by a lot of people.
|
|
- Nothing sexy or risky.
|
|
- auth exposed in annotations
|
|
|
|
ingress-nginx:
|
|
- nginx upstream.
|
|
- extra features like stream support that I'm using
|
|
on lightsail now.
|
|
- full blown virtual server support.
|
|
- maybe too complicated?
|
|
- exposes same features as I have on lightsail through
|
|
annotations, which could be a thing to get keycloack to
|
|
work.
|
|
- auth in annotations is behind paywall, but available through
|
|
a virtual server
|
|
|
|
Decision: nginx-ingress
|
|
Use LB for stuff I would use the virtual server for.
|
|
|
|
#### CNI
|
|
|
|
flannel vxlan
|
|
- status quo
|
|
- works fine
|
|
|
|
cilium
|
|
- label based network policies
|
|
- leaning toward this plus multus though I doubt
|
|
I'll ever write a policy
|
|
- I want the ability to write a policy...
|
|
- if set up different pod cidr can do multi-cluster later
|
|
- cluster name and cluster id at install time
|
|
- can do transparent encryption (not worth it...)
|
|
|
|
cilium multi-cluster networking:
|
|
- not worth the complexity
|
|
- will manage connections with ingress/egress methods.
|
|
|
|
flannel wg
|
|
- encrypt traffic and set up overlay if I want to interact with
|
|
cloud machines
|
|
- can do the same with a manual wireguard network...
|
|
|
|
istio
|
|
- I dislike side car containers
|
|
- Traffic I'm interested in is mainly not L7.
|
|
- blessed by Air Force
|
|
|
|
multus:
|
|
- tried on fedora and didn't get very far I think
|
|
because of something with k3s.
|
|
|
|
Decision: cilium
|
|
want network policies and hubble observability
|
|
is a risk, but this is supposed to be a learning
|
|
experience.
|
|
|
|
## What goes on each cluster/VM?
|
|
|
|
Lightsail:
|
|
1. Wireguard
|
|
2. Apt/RPM repos
|
|
3. Main NGINX Proxy
|
|
|
|
Infra Cluster:
|
|
- On Host:
|
|
1. CoreDNS
|
|
2. Wireguard
|
|
- On Cluster:
|
|
1. Keycloak
|
|
2. Kanboard
|
|
3. OneDev
|
|
4. Harbor
|
|
|
|
Main Cluster:
|
|
- On Host:
|
|
1. Wireguard
|
|
- On Cluster:
|
|
1. Tekton
|
|
2. MQTT Broker
|
|
3. Squid
|
|
4. j7s-os-deployment
|
|
|
|
## Deployments
|
|
|
|
Manually kubectl apply:
|
|
- Easy to reason about
|
|
- running apply is fun
|
|
- using flux has chicken and egg problem if git is also
|
|
deployed from flux
|
|
|
|
Flux:
|
|
- More git ops-y
|
|
- chicken and egg problem is conquerable, in a maybe
|
|
confusing way
|
|
|
|
Decision:
|
|
1. Infra:
|
|
1. kubectl apply/helm everything
|
|
2. Drop keycloak image manually in k3s either using cri or
|
|
placing in magic place after k3s install.
|
|
3. Use helm with values for onedev.
|
|
4. Get rid of Kanboard custom image.
|
|
Use kubectl apply.
|
|
2. Test:
|
|
1. Mostly kubectl apply for tekton.
|
|
2. Use flux for:
|
|
1. MQTT
|
|
2. j7s-os-deploy
|
|
3. squid
|
|
|
|
## VM Resources
|
|
|
|
Lightsail:
|
|
- Leave alone
|
|
|
|
Infra Cluster:
|
|
- RAM 4 GiB total
|
|
- 2 CPUs
|
|
- 120Gib Hardrive
|
|
|
|
Main Cluster:
|
|
- RAM 4 GiB total
|
|
- 2 CPUs
|
|
- 120Gib Hardrive
|
|
|
|
## Secrets
|
|
|
|
Options:
|
|
Mozilla Kops
|
|
Bitnami Sealed Secrets
|
|
|
|
Both work with Flux.
|
|
Sealed Secrets seems more integrated with k8s when not using
|
|
Flux.
|
|
|
|
Decision: Bitnami Sealed Secrets
|
|
|
|
## Experiments
|
|
|
|
### k3s with cilium and nginx on Centos Stream 9
|
|
|
|
```
|
|
systemctl disable firewalld --now
|
|
export INSTALL_K3S_EXEC="server --disable traefik --flannel-backend=none --disable-network-policy --selinux"
|
|
curl -sfL https://get.k3s.io | sh -s -
|
|
```
|
|
I see an error about selinux policies conflicting, but I'm not sure if it matters?
|
|
|
|
Install cilium following instructions here:
|
|
https://docs.cilium.io/en/v1.12/gettingstarted/k3s/
|
|
|
|
Install nginx with:
|
|
```
|
|
helm upgrade --install ingress-nginx ingress-nginx \
|
|
--repo https://kubernetes.github.io/ingress-nginx \
|
|
--namespace ingress-nginx --create-namespace \
|
|
--set controller.ingressClassResource.default=true
|
|
```
|
|
|
|
### k3s with nginx on fedora server
|
|
```
|
|
sudo systemctl disable firewalld --now
|
|
export INSTALL_K3S_EXEC="server --disable traefik --selinux"
|
|
curl -sfL https://get.k3s.io | sh -s -
|
|
sudo chown jimmy:jimmy /etc/rancher/k3s/k3s.yaml
|
|
sudo dnf install helm
|
|
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
|
|
helm upgrade --install ingress-nginx ingress-nginx --repo https://kubernetes.github.io/ingress-nginx --namespace ingress-nginx --create-namespace
|
|
```
|
|
|
|
Import simple-ros2.
|
|
Laptop:
|
|
```
|
|
podman save -o simple-ros2.tar simple-ros2:latest
|
|
scp simple-ros2.tar 192.168.1.106:~/.
|
|
```
|
|
On server:
|
|
```
|
|
sudo ctr images import ./simple-ros2.tar
|
|
# wait forever....
|
|
```
|
|
|
|
Test yaml:
|
|
```
|
|
---
|
|
apiVersion: v1
|
|
kind: Pod
|
|
metadata:
|
|
name: test-pod
|
|
spec:
|
|
containers:
|
|
- name: simple-ros2
|
|
image: localhost/simple-ros2:latest
|
|
imagePullPolicy: Never
|
|
args: [ros2, launch, j7s-simple, j7s_publisher_launch.py]
|
|
```
|
|
|
|
### VM Host set up
|
|
|
|
I **think** I ran something like this when I set up the VM host.
|
|
I don't remember exactly, and I didn't document it...
|
|
|
|
This should be carefully looked at before running.
|
|
|
|
```
|
|
nmcli connection add ifname br0 type bridge con-name br0 connection.zone trusted
|
|
nmcli connection add type bridge-slave ifname enp4s0 master br0
|
|
nmcli connection modify br0 bridge.stp no
|
|
nmcli connection modify enp4s0 autoconnect no
|
|
nmcli connection down enp4s0
|
|
nmcli connection up id br0
|
|
```
|
|
|
|
### Kubeseal Use
|
|
```
|
|
apiVersion: v1
|
|
kind: Secret
|
|
metadata:
|
|
name: test-secret
|
|
namespace: my-namespace
|
|
type: Opaque
|
|
data:
|
|
username: dmFsdWUtMQ0K
|
|
password: dmFsdWUtMg0KDQo=
|
|
stringData:
|
|
hostname: myapp.mydomain.com
|
|
```
|
|
cat secret.yaml | kubeseal --format yaml > sealedsecret.yaml
|
|
|
|
# Actual Install Notes
|
|
|
|
## To Do List
|
|
|
|
Infra Cluster: [x]
|
|
- On Host:
|
|
1. CoreDNS [x]
|
|
2. Wireguard [x]
|
|
- On Cluster:
|
|
1. Keycloak [x]
|
|
2. Kanboard [x]
|
|
3. Gitea [x]
|
|
4. Harbor [x]
|
|
|
|
Main Cluster:
|
|
- On Host:
|
|
1. Wireguard [x]
|
|
- On Cluster:
|
|
1. Tekton
|
|
Base install [ ]
|
|
Add namespace
|
|
Push images
|
|
Update tasks
|
|
Update jobs
|
|
5. Flux
|
|
1. MQTT Broker
|
|
2. Squid
|
|
3. j7s-os-deployment
|
|
|
|
[x] Give accounts on Harbor to clusters.
|
|
[ ] Push images to Harbor.
|
|
[x] Hubble.
|
|
|
|
## Regularly Scheduled Programming
|
|
|
|
Fedora Server 37 keep defaults.
|
|
|
|
Infra:
|
|
On VM:
|
|
```
|
|
sudo hostnamectl set-hostname infra-cluster
|
|
sudo systemctl disable firewalld --now
|
|
sudo su
|
|
export INSTALL_K3S_EXEC="server --disable traefik --flannel-backend=none --disable-network-policy --cluster-cidr 10.44.0.0/16 --service-cidr 10.45.0.0/16 --cluster-dns 10.45.0.10 --selinux"
|
|
curl -sfL https://get.k3s.io | sh -s -
|
|
exit
|
|
sudo cp /etc/rancher/k3s/k3s.yaml ~/infra.yaml
|
|
sudo chown jimmy:jimmy ~/infra.yaml
|
|
exit
|
|
```
|
|
|
|
on laptop
|
|
```
|
|
scp jimmy@192.168.1.112:~/infra.yaml /home/jimmy/.kube/.
|
|
export KUBECONFIG=~/.kube/infra.yaml
|
|
vim KUBECONFIG and fix ip.
|
|
```
|
|
Install cilium cli.
|
|
|
|
On laptop:
|
|
```
|
|
cilium install
|
|
```
|
|
wait...
|
|
```
|
|
helm upgrade --debug --install ingress-nginx ingress-nginx --repo https://kubernetes.github.io/ingress-nginx --namespace ingress-nginx --create-namespace
|
|
```
|
|
Main:
|
|
On VM:
|
|
```
|
|
sudo hostnamectl set-hostname j7s-cluster
|
|
sudo systemctl disable firewalld --now
|
|
sudo su
|
|
export INSTALL_K3S_EXEC="server --disable traefik --flannel-backend=none --disable-network-policy --cluster-cidr 10.46.0.0/16 --service-cidr 10.47.0.0/16 --cluster-dns 10.47.0.10 --selinux"
|
|
curl -sfL https://get.k3s.io | sh -s -
|
|
exit
|
|
sudo cp /etc/rancher/k3s/k3s.yaml ~/j7s-cluster.yaml
|
|
sudo chown jimmy:jimmy ~/j7s-cluster.yaml
|
|
exit
|
|
```
|
|
|
|
on laptop
|
|
```
|
|
scp jimmy@192.168.1.103:~/j7s-cluster.yaml /home/jimmy/.kube/.
|
|
export KUBECONFIG=~/.kube/j7s-cluster.yaml
|
|
vim KUBECONFIG and fix ip.
|
|
```
|
|
On laptop:
|
|
```
|
|
cilium install
|
|
```
|
|
wait...
|
|
```
|
|
helm upgrade --debug --install ingress-nginx ingress-nginx --repo https://kubernetes.github.io/ingress-nginx --namespace ingress-nginx --create-namespace
|
|
```
|
|
|
|
Install Sealed Secrets:
|
|
|
|
Main:
|
|
```
|
|
export KUBECONFIG=~/.kube/j7s-cluster.yaml
|
|
wget https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.19.5/controller.yaml
|
|
kubectl apply -f controller.yaml
|
|
```
|
|
Infra:
|
|
```
|
|
export KUBECONFIG=~/.kube/infra.yaml
|
|
kubectl apply -f controller.yaml
|
|
rm controller.yaml
|
|
```
|
|
|
|
Install kubeseal.
|
|
|
|
Merge kube config files:
|
|
|
|
1. Manually modify each config file and get rid of all the defaults
|
|
to something unique for that file.
|
|
( I have k3s for the original cluster, j7s for the new main cluster, and infra
|
|
for the new infra cluster. )
|
|
2. Do some magic.
|
|
```
|
|
cp config.yaml config.yaml.back.<date>
|
|
export KUBECONFIG=~/.kube/config:~/.kube/infra.yaml:~/.kube/j7s-cluster.yaml
|
|
kubectl config view --flatten > new-config
|
|
mv new-confg config
|
|
export KUBECONFIG=~/.kube/config
|
|
chmod 600 ~/.kube/config
|
|
```
|
|
|
|
Use kubeseal to encrypt secrets for harbor.
|
|
|
|
Install harbor.
|
|
```
|
|
cd infra-cluster/harbor
|
|
kubectl apply -f namespace
|
|
kubectl apply -f secrets
|
|
cd helm
|
|
./install.bash
|
|
```
|
|
|
|
Build coredns rpm following instructions in coredns folder.
|
|
scp to infra:
|
|
```
|
|
scp redhat/RPMS/x86_64/coredns-1.8.4-1.fc37.x86_64.rpm jimmy@192.168.1.112:~/.
|
|
ssh jimmy@192.168.1.112
|
|
sudo dnf install ./coredns-1.8.4-1.fc37.x86_64.rpm
|
|
exit
|
|
```
|
|
Copy over corefile from coredns folder.
|
|
```
|
|
scp Corefile jimmy@192.168.1.112:~/.
|
|
ssh jimmy@192.168.1.112
|
|
sudo cp Corefile /etc/coredns/Corefile
|
|
sudo systemctl start coredns
|
|
sudo systemctl enable coredns
|
|
|
|
sudo dnf install policycoreutils-devel rpm-build
|
|
sepolicy generate --application /bin/coredns
|
|
./coredns.sh
|
|
# Until it works....
|
|
sudo su
|
|
ausearch -c '(coredns)' --raw | audit2allow -M my-coredns
|
|
semodule -i my-coredns.pp
|
|
# Also:
|
|
sudo setsebool -P domain_can_mmap_files 1
|
|
# Turn of resolver.
|
|
sudo vim /etc/systemd/resolved.conf
|
|
DNSStubListener=no
|
|
```
|
|
|
|
Wound up turning off SELinux...
|
|
```
|
|
sudo vi /etc/selinux/config
|
|
# SELINUX=permissive
|
|
sudo grubby --update-kernel ALL --args selinux=0
|
|
```
|
|
|
|
Wound up reverting back.
|
|
|
|
Add:
|
|
```
|
|
CapabilityBoundingSet=CAP_NET_BIND_SERVICE
|
|
AmbientCapabilities=CAP_NET_BIND_SERVICE
|
|
```
|
|
|
|
under `[Service]` in
|
|
```
|
|
sudo vim /usr/lib/systemd/system/coredns.service
|
|
```
|
|
|
|
Wireguard:
|
|
|
|
```
|
|
sudo dnf install wireguard-tools
|
|
wg genkey | tee wg.key | wg pubkey > wg.pub
|
|
vim wg0.conf
|
|
<<<
|
|
[Interface]
|
|
Address = 10.100.100.?/24
|
|
PrivateKey = <Contents from file.>
|
|
|
|
[Peer]
|
|
PublicKey = zgcRWY3MAwKGokyRs9dR4E5smoeFy1Hh4MfDcDM3iSc=
|
|
AllowedIPs = 10.100.100.0/24
|
|
Endpoint = vpn.jpace121.net:51902
|
|
PersistentKeepAlive = 25
|
|
<<<<
|
|
```
|
|
Add to server:
|
|
```
|
|
# Infra k3s node
|
|
[Peer]
|
|
PublicKey = <>
|
|
AllowedIPs = 10.100.100.7/32
|
|
|
|
# Add to systemd
|
|
sudo systemctl enable wg-quick@wg0.service
|
|
sudo systemctl daemon-reload
|
|
sudo systemctl start wg-quick@wg0
|
|
```
|
|
|
|
Tried using nm below, moved to wg-quick for consistency.
|
|
```
|
|
nmcli con import type wireguard file /etc/wireguard/wg0.conf
|
|
```
|
|
|
|
Better:
|
|
```
|
|
sudo cp wg0.conf /etc/wireguard/wg0.conf
|
|
sudo chown root:root /etc/wireguard/wg0.conf
|
|
wg-quick up wg0
|
|
```
|
|
|
|
Harbor Login:
|
|
|
|
```
|
|
scp harbor_tls.crt jimmy@10.100.100.7:.
|
|
ssh jimmy@10.100.100.7
|
|
sudo cp harbor_tls.crt /etc/rancher/k3s/.
|
|
```
|
|
|
|
`/etc/rancher/k3s/registries.yaml`
|
|
```
|
|
configs:
|
|
"harbor.internal.jpace121.net":
|
|
auth:
|
|
username: robot$k8s+infra-cluster
|
|
password: <from harbor>
|
|
tls:
|
|
ca_file: /etc/rancher/k3s/harbor_tls.crt
|
|
```
|
|
|
|
Kanboard:
|
|
|
|
Get PV Name:
|
|
```
|
|
kubectl describe pvc kanboard-pvc --context k3s
|
|
```
|
|
Use PV name to locate directory:
|
|
```
|
|
kubectl describe pv pvc-89a4265c-b39c-4628-9e6b-df091fae4fd8 --context k3s
|
|
```
|
|
|
|
Can tell on `k3s-node1` at `/var/lib/rancher/k3s/storage/pvc-89a4265c-b39c-4628-9e6b-df091fae4fd8_default_kanboard-pvc`
|
|
|
|
|
|
```
|
|
ssh jimmy@192.168.1.135
|
|
sudo su
|
|
cd /var/lib/rancher/k3s/storage/pvc-89a4265c-b39c-4628-9e6b-df091fae4fd8_default_kanboard-pvc
|
|
tar cvpzf /home/jimmy/kanboard-pvc.tar.gz .
|
|
exit
|
|
cd ~
|
|
sudo chown jimmy:jimmy kanboard-pvc.tar.gz
|
|
exit
|
|
scp jimmy@192.168.1.135:~/kanboard-pvc.tar.gz /tmp/kanboard-pvc.tar.gz
|
|
```
|
|
Apply PVC.
|
|
Want: `volumeBindingMode: Immediate`
|
|
```
|
|
kubectl apply manifests --context infra
|
|
<wait til pvc exists>
|
|
<delete everyone but pvc>
|
|
kubectl describe pvc kanboard-pvc --context infra --namespace kanboard
|
|
kubectl describe pv pvc-fe710c38-52ce-495b-bb8d-bea48222a21b --namespace kanboard
|
|
```
|
|
|
|
```
|
|
scp /tmp/kanboard-pvc.tar.gz jimmy@192.168.1.112:.
|
|
ssh jimmy@192.168.1.112
|
|
sudo su
|
|
chown root:root ./kanboard-pvc.tar.gz
|
|
cd /var/lib/rancher/k3s/storage/pvc-fe710c38-52ce-495b-bb8d-bea48222a21b_kanboard_kanboard-pvc
|
|
rm -rf *
|
|
tar xpvzf /home/jimmy/kanboard-pvc.tar.gz
|
|
exit
|
|
exit
|
|
kubectl apply -f manifests/
|
|
```
|
|
Make secret:
|
|
```
|
|
cat kanboard-cookie.yaml | kubeseal --format yaml > kanboard-cookie-sealed.yaml
|
|
```
|
|
|
|
Where should I proxy to?
|
|
```
|
|
kubectl -n ingress-nginx get svc
|
|
ngress-nginx-controller LoadBalancer 10.45.94.103 192.168.1.112 80:31566/TCP,443:32594/TCP 23d
|
|
```
|
|
> 10.100.100.7:31566
|
|
|
|
### Tekton
|
|
|
|
kubectl apply --filename https://storage.googleapis.com/tekton-releases/pipeline/latest/release.yaml
|
|
kubectl apply --filename https://storage.googleapis.com/tekton-releases/triggers/latest/release.yaml
|
|
kubectl apply --filename https://storage.googleapis.com/tekton-releases/triggers/latest/interceptors.yaml
|
|
kubectl apply --filename https://storage.googleapis.com/tekton-releases/dashboard/latest/release.yaml
|
|
|
|
### Keycloak
|
|
kubectl describe pv pvc-4bcbb023-e686-4082-855f-d062ff418c74 --namespace keycloak
|
|
`/var/lib/rancher/k3s/storage/pvc-4bcbb023-e686-4082-855f-d062ff418c74_keycloak_keycloak-db-pvc`
|
|
`scp /tmp/db-backup.tar.gz jimmy@192.168.1.112:.`
|
|
```
|
|
sudo su
|
|
chown root:root ./db-backup.tar.gz
|
|
cd /var/lib/rancher/k3s/storage/pvc-4bcbb023-e686-4082-855f-d062ff418c74_keycloak_keycloak-db-pvc
|
|
rm -rf *
|
|
tar xpvzf /home/jimmy/db-backup.tar.gz
|
|
sudo chown -R systemd-oom:systemd-oom *
|
|
```
|
|
|