The rack under the desk, run like a cloud provider.
Three Talos Kubernetes clusters today: core, dev, and prod. A fourth is coming once the second site is live. Etcd has three votes everywhere. DNS rides a VIP. The edge splits into internal and public gateways. The whole thing is declared in git, delivered by ArgoCD, watched by a self-hosted LGTM stack. This site is served from it.
●
The secondary AZ is offline. Hardware is in the middle of moving between sites, so prod is running on one AZ until it's back. Everything below shows the current state next to the multi-AZ plan.
3 Talos clusters today — core · dev · prod (4th planned for second AZ)
9 Kubernetes nodes today — 12 once the second AZ is up
3 bare-metal EliteDesks running prod (+3 more planned for the second AZ)
Four design principles run through the lab. The rest of the page shows how each one is wired up.
Hardware redundancy
Every Talos cluster runs a three-node etcd quorum. Three Technitium DNS instances sit behind a keepalived VIP with AXFR replication. Storage replication and LGTM HA aren’t there yet — the HA diagram below shows where each one stands.
High availability
Services self-heal through Kubernetes. The edge is split: internal and public traffic land on separate Envoy Gateways, each with its own IPs and policies. Per-cluster Cloudflare tunnels run two replicas. Private PKI and observability are still single-instance, and both are queued for HA work.
GitOps all the way down
Every change is a commit. ArgoCD reconciles workloads, Talos holds cluster state, Ansible holds host state. Rollbacks are a git revert and a webhook. Production promotions go through an auto-generated PR that a human still has to merge.
Multi-AZ by design
A second AZ is wired up over a UniFi site-to-site VPN. The hardware is mid-move between sites right now, so production is on one AZ until it lands. DNS zones, cluster naming, and storage all already assume a second site, so adding it back is an addition rather than a rewrite.
02/Diagram 01
Five layers, one focal plane.
Infrastructure → workloads
Five layers between bare metal and a running pod. Every one of them is boring, which is the point. Adding a new app on top barely touches the stack below.
03/Diagram 02
How a commit becomes a pod.
GitOps end-to-end
Every app ships the same way. Push to main, CI builds and publishes an image, a dispatch event tells the homelab repo to bump the tag, ArgoCD reconciles, Talos rolls. Prod promotions go through an auto-generated PR that a human still has to merge.
04/Diagram 03
What runs where, physically.
Three tiers · one production line today
Three tiers of physical compute. Production runs straight on bare-metal EliteDesks, so there's no hypervisor in the critical path. The Proxmox hosts carry the core and dev clusters as VMs. Edge services like DNS and load balancing, plus storage, sit on Raspberry Pis and TrueNAS boxes. The second AZ's bare-metal tier will mirror this one once the move finishes.
05/Diagram 04
Six VLANs, one firewall.
Segmentation by trust tier
Web traffic only enters through a Cloudflare Tunnel that opens outbound. No HTTP service has an inbound port forward. Six VLANs segment by trust tier. The firewall denies between tiers by default and only allows what's needed (Trusted Clients → Servers, plus a few IoT exceptions).
06/Diagram 05
What's replicated, what isn't (yet).
Redundant today vs. single-instance today
Reliability is never finished. The left column is what's already redundant. The right column is what still runs as a single instance, with the next step lined up for each one.
07/The roster
The hosts, dynamically generated from the homelab's inventory.
13 Ansible-managed hosts
Bare-metal Talos nodes are configured through talosctl, so they show up in the cluster topology above but not in this roster. The roster below is generated from the homelab's inventory file, so it reflects whatever is actually deployed.
Media VM
1× arr
Media library automation on a dedicated VM.
arr-vm
DNS node
3× dns
Technitium DNS — 3 instances behind a keepalived VIP. rpi-n1 is the primary; dns-n2 + dns-n3 are secondaries.
dns-n2
dns-n3
rpi-n1
Observability VM
1× lgtm
Self-hosted Loki + Grafana + Tempo + Mimir stack.
lgtm-vm
NAS
2× nas
ZFS storage with Cloud Sync + RSync for 3-2-1 backup.
cm-nas
jb-nas
Proxmox host
3× proxmox
Hypervisor hosts running most VM-backed clusters.
bd-n1
bd-n2
hx90
Raspberry Pi
2× raspbian
Low-power utility nodes — rpi-n1 runs the DNS primary; rpi-n2 is the edge load balancer.
rpi-n1
rpi-n2
Game server node
2× wings
Pelican game-server control plane on the untrusted VLAN.
wings-n1
wings-n2
08/Named tools
The stack, in words.
Full named-service inventory
Infrastructure
Physical hosts, hypervisor, networking
HP EliteDesk (×3, bare-metal prod)Minisforum HX90 / BD795i (×3, Proxmox)Raspberry Pi 5 (×2)TrueNAS (×2)VLAN segmentation
Operating system + Kubernetes
Immutable OS, 3 Talos clusters, MetalLB L2
Talos LinuxKubernetesMetalLBEnvoy GatewayGateway API