First pass of cluster v2 design.

2023-02-19 09:18:32 -05:00 · 2023-02-19 09:18:32 -05:00 · 8f1d8ef784
parent 40750e12eb
commit 8f1d8ef784
1 changed files with 277 additions and 0 deletions
--- a/cluster-v2-design.md
+++ b/cluster-v2-design.md
@ -0,0 +1,277 @@
 # Design for Second Iteration of Cluster/Homelab
 ## Context
 Current cluster was set up just to run CI builds as a
 trial.
 I'm now sold the k8s is a good approach and would like
 to move more of my services to it.
 This document will track my design for cluster v2.
 ## Investigation
 ### Host OS
 Debian:
 - on laptop
 - already on most of systems
 - stable
 - not officially tested by k3s
 - Will be using apt at work
 Stream:
 - Tried with k3s and had to disable systemd...
    - On second try seemed to work even with error I saw before.
 - Cockpit is nice when managing servers.
 - Want to like RHEL
 - More stable than Fedora
 - RPMs are easier to work with
 - Using on VM host
 Fedora:
 - Want to like RHEL
 - Tested with k3s
 - Latest podman and frieds
 - Really fast for something stable...
 - Cockpit is nice
 - Fedora minimal can't be installed on
  cockpit.
 Decision: Stream
 Put var/rancher on a separate partition.
 ### K3S Distro
 RKE2:
   - no Debian support
   - 4GB Minimum
   - 2 CPU
   - cilium and nginx not default
 k3s:
   - k3d is a thing
   - documentation online is good
   - 512 MB of RAM
   - 1 CPU
   - easy installation
 Decision: k3s
 ### How many clusters?
 Decision: Exactly two (one for "need to work" services one for CI and messing around).
 The mess with longhorn scared me... it wouldn't be that big a deal if it only effected
 CI, but it also effect Kanboard and git.
 ### Files
 Decision: Host local.
 Files are not something I want to have to think about.
 The longhorn mess scared me.
 NFS not working with postgres is annoying.
 ### How many nodes per cluster?
 The current cluster has lots of small VMs, with VMs added with more
 CPUs/RAM as the requirements grew.
 I'd rather limit myself to fewer more powerful VMs, and let the VM OS manage
 CPU and memory.
 More nodes would be useful if they were on different base hardware.
 Realistically I'm never going to pay for more than the Ingress VM...
 Decision:
 1 big VM per cluster.
 Both VMs hosted on current hardware.
 If we add hardware, can add an additional node at that time.
 ### Networking
 Status quo is flannel with vxlan with Traefik and Klipper LB and CoreDNS.
 #### DNS
 CoreDNS is great.
 #### Load Balancer
 Klipper works fine now.
 MetalLB is the other option, is more complicated and doesn't
 seem to give much particularly with a single node cluster.
 Decision: Klipper
 #### Ingress
 Traefik:
 - Status Quo.
 - Works fine.
 - Outside of k8s I don't like.
 nginx-ingress:
 - Google
 - Used by a lot of people.
 - Nothing sexy or risky.
 - auth exposed in annotations
 ingress-nginx:
 - nginx upstream.
 - extra features like stream support that I'm using
  on lightsail now.
 - full blown virtual server support.
 - maybe too complicated?
 - exposes same features as I have on lightsail through
  annotations, which could be a thing to get keycloack to
  work.
 - auth in annotations is behind paywall, but available through
  a virtual server
 Decision: nginx-ingress
 Use LB for stuff I would use the virtual server for.
 #### CNI
 flannel vxlan
 - status quo
 - works fine
 cilium
 - label based network policies
 - leaning toward this plus multus though I doubt
  I'll ever write a policy
 - I want the ability to write a policy...
 - if set up different pod cidr can do multi-cluster later
   - cluster name and cluster id at install time
 - can do transparent encryption (not worth it...)
 cilium multi-cluster networking:
 - not worth the complexity
 - will manage connections with ingress/egress methods.
 flannel wg
 - encrypt traffic and set up overlay if I want to interact with
  cloud machines
 - can do the same with a manual wireguard network...
 istio
 - I dislike side car containers
 - Traffic I'm interested in is mainly not L7.
 - blessed by Air Force
 Decision: flannel vxlan
 not worth the extra complexity of cilium.
 ## What goes on each cluster/VM?
 Lightsail:
 1. Wireguard
 2. Apt/RPM repos
 3. Main NGINX Proxy
 Infra Cluster:
 - On Host:
    1. CoreDNS
    2. Wireguard
 - On Cluster:
    1. Keycloak
    2. Kanboard
    3. OneDev
    4. Harbor
 Main Cluster:
 - On Host:
    1. Wireguard
 - On Cluster:
    1. Tekton
    2. MQTT Broker
    3. Squid
    4. j7s-os-deployment
 ## Deployments
 Manually kubectl apply:
 - Easy to reason about
 - running apply is fun
 - using flux has chicken and egg problem if git is also
  deployed from flux
 Flux:
 - More git ops-y
 - chicken and egg problem is conquerable, in a maybe
  confusing way
 Decision:
 1. Infra:
  1. kubectl apply/helm everything
  2. Drop keycloak image manually in k3s either using cri or
     placing in magic place after k3s install.
  3. Use helm with values for onedev.
  4. Get rid of Kanboard custom image.
     Use kubectl apply.
 2. Test:
   1. Mostly kubectl apply for tekton.
   2. Use flux for:
      1. MQTT
      2. j7s-os-deploy
      3. squid
 ## VM Resources
 Lightsail:
 - Leave alone
 Infra Cluster:
 - On Host:
    1. CoreDNS
    2. Wireguard
 - On Cluster:
    1. Keycloak
    2. Kanboard
    3. OneDev
    4. Harbor
 Main Cluster:
 - On Host:
    1. Wireguard
 - On Cluster:
    1. Tekton
    2. MQTT Broker
    3. Squid
    4. j7s-os-deployment
 ## Stuff to experiment with
 [ ] Manually placing keycloak image in k3s through k3s thing
    and/or through cri.
 [ ] Keycloak ssl passthrough.
 [ ] fedora 37 server install with k3s.
 ## Experiments
 ### k3s with cilium and nginx on Centos Stream 9
 ```
 systemctl disable firewalld --now
 export INSTALL_K3S_EXEC="server --disable traefik --flannel-backend=none --disable-network-policy --selinux"
 curl -sfL https://get.k3s.io | sh -s -
 ```
 I see an error about selinux policies conflicting, but I'm not sure if it matters?
 Install cilium following instructions here:
 https://docs.cilium.io/en/v1.12/gettingstarted/k3s/
 Install nginx with:
 ```
 helm upgrade --install ingress-nginx ingress-nginx \
  --repo https://kubernetes.github.io/ingress-nginx \
  --namespace ingress-nginx --create-namespace
 ```
 ### k3s with nginx on fedora server