5.8 KiB

Raw Blame History

Design for Second Iteration of Cluster/Homelab

Context

Current cluster was set up just to run CI builds as a trial.

I'm now sold the k8s is a good approach and would like to move more of my services to it.

This document will track my design for cluster v2.

Investigation

Host OS

Debian:

on laptop
already on most of systems
stable
not officially tested by k3s
Will be using apt at work

Stream:

Tried with k3s and had to disable systemd...
- On second try seemed to work even with error I saw before.
Cockpit is nice when managing servers.
Want to like RHEL
More stable than Fedora
RPMs are easier to work with
Using on VM host

Fedora:

Want to like RHEL
Tested with k3s
Latest podman and frieds
Really fast for something stable...
Cockpit is nice
Fedora minimal can't be installed on cockpit.

Decision: Stream

Put var/rancher on a separate partition.

K3S Distro

RKE2:

no Debian support
4GB Minimum
2 CPU
cilium and nginx not default

k3s:

k3d is a thing
documentation online is good
512 MB of RAM
1 CPU
easy installation

Decision: k3s

How many clusters?

Decision: Exactly two (one for "need to work" services one for CI and messing around). The mess with longhorn scared me... it wouldn't be that big a deal if it only effected CI, but it also effect Kanboard and git.

Files

Decision: Host local.

Files are not something I want to have to think about. The longhorn mess scared me. NFS not working with postgres is annoying.

How many nodes per cluster?

The current cluster has lots of small VMs, with VMs added with more CPUs/RAM as the requirements grew.

I'd rather limit myself to fewer more powerful VMs, and let the VM OS manage CPU and memory.

More nodes would be useful if they were on different base hardware. Realistically I'm never going to pay for more than the Ingress VM...

Decision: 1 big VM per cluster. Both VMs hosted on current hardware. If we add hardware, can add an additional node at that time.

Networking

Status quo is flannel with vxlan with Traefik and Klipper LB and CoreDNS.

DNS

CoreDNS is great.

Load Balancer

Klipper works fine now. MetalLB is the other option, is more complicated and doesn't seem to give much particularly with a single node cluster.

Decision: Klipper

Ingress

Traefik:

Status Quo.
Works fine.
Outside of k8s I don't like.

nginx-ingress:

Google
Used by a lot of people.
Nothing sexy or risky.
auth exposed in annotations

ingress-nginx:

nginx upstream.
extra features like stream support that I'm using on lightsail now.
full blown virtual server support.
maybe too complicated?
exposes same features as I have on lightsail through annotations, which could be a thing to get keycloack to work.
auth in annotations is behind paywall, but available through a virtual server

Decision: nginx-ingress Use LB for stuff I would use the virtual server for.

CNI

flannel vxlan

status quo
works fine

cilium

label based network policies
leaning toward this plus multus though I doubt I'll ever write a policy
I want the ability to write a policy...
if set up different pod cidr can do multi-cluster later
- cluster name and cluster id at install time
can do transparent encryption (not worth it...)

cilium multi-cluster networking:

not worth the complexity
will manage connections with ingress/egress methods.

flannel wg

encrypt traffic and set up overlay if I want to interact with cloud machines
can do the same with a manual wireguard network...

istio

I dislike side car containers
Traffic I'm interested in is mainly not L7.
blessed by Air Force

Decision: flannel vxlan not worth the extra complexity of cilium.

What goes on each cluster/VM?

Lightsail:

Wireguard
Apt/RPM repos
Main NGINX Proxy

Infra Cluster:

On Host:
1. CoreDNS
2. Wireguard
On Cluster:
1. Keycloak
2. Kanboard
3. OneDev
4. Harbor

Main Cluster:

On Host:
1. Wireguard
On Cluster:
1. Tekton
2. MQTT Broker
3. Squid
4. j7s-os-deployment

Deployments

Manually kubectl apply:

Easy to reason about
running apply is fun
using flux has chicken and egg problem if git is also deployed from flux

Flux:

More git ops-y
chicken and egg problem is conquerable, in a maybe confusing way

Decision:

Infra:
kubectl apply/helm everything
Drop keycloak image manually in k3s either using cri or placing in magic place after k3s install.
Use helm with values for onedev.
Get rid of Kanboard custom image. Use kubectl apply.
Test:
1. Mostly kubectl apply for tekton.
2. Use flux for:
  1. MQTT
  2. j7s-os-deploy
  3. squid

VM Resources

Lightsail:

Leave alone

Infra Cluster:

On Host:
1. CoreDNS
2. Wireguard
On Cluster:
1. Keycloak
2. Kanboard
3. OneDev
4. Harbor

Main Cluster:

On Host:
1. Wireguard
On Cluster:
1. Tekton
2. MQTT Broker
3. Squid
4. j7s-os-deployment

Stuff to experiment with

[ ] Manually placing keycloak image in k3s through k3s thing and/or through cri.

[ ] Keycloak ssl passthrough.

[ ] fedora 37 server install with k3s.

Experiments

k3s with cilium and nginx on Centos Stream 9

systemctl disable firewalld --now
export INSTALL_K3S_EXEC="server --disable traefik --flannel-backend=none --disable-network-policy --selinux"
curl -sfL https://get.k3s.io | sh -s -

I see an error about selinux policies conflicting, but I'm not sure if it matters?

Install cilium following instructions here: https://docs.cilium.io/en/v1.12/gettingstarted/k3s/

Install nginx with:

helm upgrade --install ingress-nginx ingress-nginx \
  --repo https://kubernetes.github.io/ingress-nginx \
  --namespace ingress-nginx --create-namespace

5.8 KiB Raw Blame History