312 lines
6.8 KiB
Markdown
312 lines
6.8 KiB
Markdown
# Design for Second Iteration of Cluster/Homelab
|
|
|
|
## Context
|
|
|
|
Current cluster was set up just to run CI builds as a
|
|
trial.
|
|
|
|
I'm now sold the k8s is a good approach and would like
|
|
to move more of my services to it.
|
|
|
|
This document will track my design for cluster v2.
|
|
|
|
|
|
## Investigation
|
|
|
|
### Host OS
|
|
|
|
Debian:
|
|
- on laptop
|
|
- already on most of systems
|
|
- stable
|
|
- not officially tested by k3s
|
|
- Will be using apt at work
|
|
|
|
Stream:
|
|
- Tried with k3s and had to disable systemd...
|
|
- On second try seemed to work even with error I saw before.
|
|
- Cockpit is nice when managing servers.
|
|
- Want to like RHEL
|
|
- More stable than Fedora
|
|
- RPMs are easier to work with
|
|
- Using on VM host
|
|
|
|
Fedora:
|
|
- Want to like RHEL
|
|
- Tested with k3s
|
|
- Latest podman and frieds
|
|
- Really fast for something stable...
|
|
- Cockpit is nice
|
|
- Fedora minimal can't be installed on
|
|
cockpit without hitting tab alot.
|
|
|
|
Decision: Fedora Server
|
|
|
|
|
|
### K3S Distro
|
|
|
|
RKE2:
|
|
- no Debian support
|
|
- 4GB Minimum
|
|
- 2 CPU
|
|
- cilium and nginx not default
|
|
|
|
k3s:
|
|
- k3d is a thing
|
|
- documentation online is good
|
|
- 512 MB of RAM
|
|
- 1 CPU
|
|
- easy installation
|
|
|
|
Decision: k3s
|
|
|
|
### How many clusters?
|
|
|
|
Decision: Exactly two (one for "need to work" services one for CI and messing around).
|
|
The mess with longhorn scared me... it wouldn't be that big a deal if it only effected
|
|
CI, but it also effect Kanboard and git.
|
|
|
|
### Files
|
|
|
|
Decision: Host local.
|
|
|
|
Files are not something I want to have to think about.
|
|
The longhorn mess scared me.
|
|
NFS not working with postgres is annoying.
|
|
|
|
### How many nodes per cluster?
|
|
|
|
The current cluster has lots of small VMs, with VMs added with more
|
|
CPUs/RAM as the requirements grew.
|
|
|
|
I'd rather limit myself to fewer more powerful VMs, and let the VM OS manage
|
|
CPU and memory.
|
|
|
|
More nodes would be useful if they were on different base hardware.
|
|
Realistically I'm never going to pay for more than the Ingress VM...
|
|
|
|
Decision:
|
|
1 big VM per cluster.
|
|
Both VMs hosted on current hardware.
|
|
If we add hardware, can add an additional node at that time.
|
|
|
|
### Networking
|
|
|
|
Status quo is flannel with vxlan with Traefik and Klipper LB and CoreDNS.
|
|
|
|
#### DNS
|
|
|
|
CoreDNS is great.
|
|
|
|
#### Load Balancer
|
|
|
|
Klipper works fine now.
|
|
MetalLB is the other option, is more complicated and doesn't
|
|
seem to give much particularly with a single node cluster.
|
|
|
|
Decision: Klipper
|
|
|
|
#### Ingress
|
|
|
|
Traefik:
|
|
- Status Quo.
|
|
- Works fine.
|
|
- Outside of k8s I don't like.
|
|
|
|
nginx-ingress:
|
|
- Google
|
|
- Used by a lot of people.
|
|
- Nothing sexy or risky.
|
|
- auth exposed in annotations
|
|
|
|
ingress-nginx:
|
|
- nginx upstream.
|
|
- extra features like stream support that I'm using
|
|
on lightsail now.
|
|
- full blown virtual server support.
|
|
- maybe too complicated?
|
|
- exposes same features as I have on lightsail through
|
|
annotations, which could be a thing to get keycloack to
|
|
work.
|
|
- auth in annotations is behind paywall, but available through
|
|
a virtual server
|
|
|
|
Decision: nginx-ingress
|
|
Use LB for stuff I would use the virtual server for.
|
|
|
|
#### CNI
|
|
|
|
flannel vxlan
|
|
- status quo
|
|
- works fine
|
|
|
|
cilium
|
|
- label based network policies
|
|
- leaning toward this plus multus though I doubt
|
|
I'll ever write a policy
|
|
- I want the ability to write a policy...
|
|
- if set up different pod cidr can do multi-cluster later
|
|
- cluster name and cluster id at install time
|
|
- can do transparent encryption (not worth it...)
|
|
|
|
cilium multi-cluster networking:
|
|
- not worth the complexity
|
|
- will manage connections with ingress/egress methods.
|
|
|
|
flannel wg
|
|
- encrypt traffic and set up overlay if I want to interact with
|
|
cloud machines
|
|
- can do the same with a manual wireguard network...
|
|
|
|
istio
|
|
- I dislike side car containers
|
|
- Traffic I'm interested in is mainly not L7.
|
|
- blessed by Air Force
|
|
|
|
Decision: flannel vxlan
|
|
not worth the extra complexity of cilium.
|
|
|
|
## What goes on each cluster/VM?
|
|
|
|
Lightsail:
|
|
1. Wireguard
|
|
2. Apt/RPM repos
|
|
3. Main NGINX Proxy
|
|
|
|
Infra Cluster:
|
|
- On Host:
|
|
1. CoreDNS
|
|
2. Wireguard
|
|
- On Cluster:
|
|
1. Keycloak
|
|
2. Kanboard
|
|
3. OneDev
|
|
4. Harbor
|
|
|
|
Main Cluster:
|
|
- On Host:
|
|
1. Wireguard
|
|
- On Cluster:
|
|
1. Tekton
|
|
2. MQTT Broker
|
|
3. Squid
|
|
4. j7s-os-deployment
|
|
|
|
## Deployments
|
|
|
|
Manually kubectl apply:
|
|
- Easy to reason about
|
|
- running apply is fun
|
|
- using flux has chicken and egg problem if git is also
|
|
deployed from flux
|
|
|
|
Flux:
|
|
- More git ops-y
|
|
- chicken and egg problem is conquerable, in a maybe
|
|
confusing way
|
|
|
|
Decision:
|
|
1. Infra:
|
|
1. kubectl apply/helm everything
|
|
2. Drop keycloak image manually in k3s either using cri or
|
|
placing in magic place after k3s install.
|
|
3. Use helm with values for onedev.
|
|
4. Get rid of Kanboard custom image.
|
|
Use kubectl apply.
|
|
2. Test:
|
|
1. Mostly kubectl apply for tekton.
|
|
2. Use flux for:
|
|
1. MQTT
|
|
2. j7s-os-deploy
|
|
3. squid
|
|
|
|
## VM Resources
|
|
|
|
Lightsail:
|
|
- Leave alone
|
|
|
|
Infra Cluster:
|
|
- RAM 4 GiB total
|
|
- 3 CPUs
|
|
- 200Gib Hardrive
|
|
|
|
Main Cluster:
|
|
- RAM 4 GiB total
|
|
- 3 CPUs
|
|
- 200Gib Hardrive
|
|
|
|
## Experiments
|
|
|
|
### k3s with cilium and nginx on Centos Stream 9
|
|
|
|
```
|
|
systemctl disable firewalld --now
|
|
export INSTALL_K3S_EXEC="server --disable traefik --flannel-backend=none --disable-network-policy --selinux"
|
|
curl -sfL https://get.k3s.io | sh -s -
|
|
```
|
|
I see an error about selinux policies conflicting, but I'm not sure if it matters?
|
|
|
|
Install cilium following instructions here:
|
|
https://docs.cilium.io/en/v1.12/gettingstarted/k3s/
|
|
|
|
Install nginx with:
|
|
```
|
|
helm upgrade --install ingress-nginx ingress-nginx \
|
|
--repo https://kubernetes.github.io/ingress-nginx \
|
|
--namespace ingress-nginx --create-namespace
|
|
```
|
|
|
|
### k3s with nginx on fedora server
|
|
```
|
|
sudo systemctl disable firewalld --now
|
|
export INSTALL_K3S_EXEC="server --disable traefik --selinux"
|
|
curl -sfL https://get.k3s.io | sh -s -
|
|
sudo chown jimmy:jimmy /etc/rancher/k3s/k3s.yaml
|
|
sudo dnf install helm
|
|
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
|
|
helm upgrade --install ingress-nginx ingress-nginx --repo https://kubernetes.github.io/ingress-nginx --namespace ingress-nginx --create-namespace
|
|
```
|
|
|
|
Import simple-ros2.
|
|
Laptop:
|
|
```
|
|
podman save -o simple-ros2.tar simple-ros2:latest
|
|
scp simple-ros2.tar 192.168.1.106:~/.
|
|
```
|
|
On server:
|
|
```
|
|
sudo ctr images import ./simple-ros2.tar
|
|
# wait forever....
|
|
```
|
|
|
|
Test yaml:
|
|
```
|
|
---
|
|
apiVersion: v1
|
|
kind: Pod
|
|
metadata:
|
|
name: test-pod
|
|
spec:
|
|
containers:
|
|
- name: simple-ros2
|
|
image: localhost/simple-ros2:latest
|
|
imagePullPolicy: Never
|
|
args: [ros2, launch, j7s-simple, j7s_publisher_launch.py]
|
|
```
|
|
|
|
### VM Host set up
|
|
|
|
I **think** I ran something like this when I set up the VM host.
|
|
I don't remember exactly, and I didn't document it...
|
|
|
|
This should be carefully looked at before running.
|
|
|
|
```
|
|
nmcli connection add ifname br0 type bridge con-name br0 connection.zone trusted
|
|
nmcli connection add type bridge-slave ifname enp4s0 master br0
|
|
nmcli connection modify br0 bridge.stp no
|
|
nmcli connection modify enp4s0 autoconnect no
|
|
nmcli connection down enp4s0
|
|
nmcli connection up id br0
|
|
```
|