03//lab

The rack under the desk,
run like a cloud provider.

Three Talos Kubernetes clusters today: core, dev, and prod. A fourth is coming once the second site is live. Etcd has three votes everywhere. DNS rides a VIP. The edge splits into internal and public gateways. The whole thing is declared in git, delivered by ArgoCD, watched by a self-hosted LGTM stack. This site is served from it.

The secondary AZ is offline. Hardware is in the middle of moving between sites, so prod is running on one AZ until it's back. Everything below shows the current state next to the multi-AZ plan.

3 Talos clusters today — core · dev · prod (4th planned for second AZ)

9 Kubernetes nodes today — 12 once the second AZ is up

3 bare-metal EliteDesks running prod (+3 more planned for the second AZ)

3 Technitium DNS instances behind a VIP

13 Ansible-managed Linux hosts

6 VLANs — mgmt · trusted · DMZ · IoT · guest · clients

01/How it's designed

Architecture directions.

Four things the lab is designed around

Four design principles run through the lab. The rest of the page shows how each one is wired up.

Hardware redundancy

Every Talos cluster runs a three-node etcd quorum. Three Technitium DNS instances sit behind a keepalived VIP with AXFR replication. Storage replication and LGTM HA aren’t there yet — the HA diagram below shows where each one stands.

High availability

Services self-heal through Kubernetes. The edge is split: internal and public traffic land on separate Envoy Gateways, each with its own IPs and policies. Per-cluster Cloudflare tunnels run two replicas. Private PKI and observability are still single-instance, and both are queued for HA work.

GitOps all the way down

Every change is a commit. ArgoCD reconciles workloads, Talos holds cluster state, Ansible holds host state. Rollbacks are a git revert and a webhook. Production promotions go through an auto-generated PR that a human still has to merge.

Multi-AZ by design

A second AZ is wired up over a UniFi site-to-site VPN. The hardware is mid-move between sites right now, so production is on one AZ until it lands. DNS zones, cluster naming, and storage all already assume a second site, so adding it back is an addition rather than a rewrite.

02/Diagram 01

Five layers, one focal plane.

Infrastructure → workloads

Five layers between bare metal and a running pod. Every one of them is boring, which is the point. Adding a new app on top barely touches the stack below.

03/Diagram 02

How a commit becomes a pod.

GitOps end-to-end

Every app ships the same way. Push to main, CI builds and publishes an image, a dispatch event tells the homelab repo to bump the tag, ArgoCD reconciles, Talos rolls. Prod promotions go through an auto-generated PR that a human still has to merge.

04/Diagram 03

What runs where, physically.

Three tiers · one production line today

Three tiers of physical compute. Production runs straight on bare-metal EliteDesks, so there's no hypervisor in the critical path. The Proxmox hosts carry the core and dev clusters as VMs. Edge services like DNS and load balancing, plus storage, sit on Raspberry Pis and TrueNAS boxes. The second AZ's bare-metal tier will mirror this one once the move finishes.

05/Diagram 04

Six VLANs, one firewall.

Segmentation by trust tier

Web traffic only enters through a Cloudflare Tunnel that opens outbound. No HTTP service has an inbound port forward. Six VLANs segment by trust tier. The firewall denies between tiers by default and only allows what's needed (Trusted Clients → Servers, plus a few IoT exceptions).

06/Diagram 05

What's replicated, what isn't (yet).

Redundant today vs. single-instance today

Reliability is never finished. The left column is what's already redundant. The right column is what still runs as a single instance, with the next step lined up for each one.

07/The roster

The hosts, dynamically generated from the homelab's inventory.

13 Ansible-managed hosts

Bare-metal Talos nodes are configured through talosctl, so they show up in the cluster topology above but not in this roster. The roster below is generated from the homelab's inventory file, so it reflects whatever is actually deployed.

Media VM

1× arr

Media library automation on a dedicated VM.

arr-vm

DNS node

3× dns

Technitium DNS — 3 instances behind a keepalived VIP. rpi-n1 is the primary; dns-n2 + dns-n3 are secondaries.

dns-n2
dns-n3
rpi-n1

Observability VM

1× lgtm

Self-hosted Loki + Grafana + Tempo + Mimir stack.

lgtm-vm

NAS

2× nas

ZFS storage with Cloud Sync + RSync for 3-2-1 backup.

cm-nas
jb-nas

Proxmox host

3× proxmox

Hypervisor hosts running most VM-backed clusters.

bd-n1
bd-n2
hx90

Raspberry Pi

2× raspbian

Low-power utility nodes — rpi-n1 runs the DNS primary; rpi-n2 is the edge load balancer.

rpi-n1
rpi-n2

Game server node

2× wings

Pelican game-server control plane on the untrusted VLAN.

wings-n1
wings-n2

08/Named tools

The stack, in words.

Full named-service inventory

Infrastructure

Physical hosts, hypervisor, networking

HP EliteDesk (×3, bare-metal prod)

Minisforum HX90 / BD795i (×3, Proxmox)

Raspberry Pi 5 (×2)

TrueNAS (×2)

VLAN segmentation

Operating system + Kubernetes

Immutable OS, 3 Talos clusters, MetalLB L2

Talos Linux

Kubernetes

MetalLB

Envoy Gateway

Gateway API

Platform services

The cloud-native plane that ties it all together

ArgoCD

Argo Workflows

cert-manager

external-dns

Technitium DNS

CloudNativePG

Dragonfly

External Secrets Operator

Bitwarden Secrets Manager

Reloader

Observability

LGTM stack, self-hosted

Grafana

Loki

Tempo

Mimir

Grafana Alloy

Prometheus Operator

Delivery pipeline

GitOps end-to-end, PRs promote dev → prod

GitHub Actions

ghcr.io

auto-promoted image tags

Renovate

Apps running here

Workloads shipped from personal repos

kian.coffee

techgarden.gg

Hausparty

Pelican + Wings (game servers)

The rack under the desk,run like a cloud provider.