Running Talos Linux on a Raspberry Pi Cluster — And Loving Every Minute of It

I run OpenShift clusters at work, so when it came to my home lab I wanted something closer to that CoreOS experience. Here's how I ended up running Talos Linux on eight Raspberry Pi Compute Modules across two DeskPi Super6C boards — with NVMe boot, custom CM5 images, and this very blog hosted on the

Running Talos Linux on a Raspberry Pi Cluster — And Loving Every Minute of It
talos_hero

I run Kubernetes at work — a lot of it. OpenShift clusters, RHCOS nodes, the whole Red Hat ecosystem. So when it came to my home lab, I wanted something that felt closer to that experience than just bolting K3s on top of Raspberry Pi OS and calling it a day. That’s how I ended up running Talos Linux on a cluster of Raspberry Pi Compute Modules — and honestly, it’s been one of the more rewarding rabbit holes I’ve gone down.

The Hardware Journey

I started with a Turing Pi 2 board — it was a great concept, but I quickly hit a wall. I wanted to boot from NVMe drives directly, and the Turing Pi 2 just didn’t support that the way I needed for CM4 and CM5 modules. So after some researching I found the DeskPi Super6C — a mini-ITX board that takes up to six Compute Modules with native M.2 NVMe slots for each node. That was exactly what I was after.

I’m running two Super6C boards right now with a total of eight nodes:

  • 5 × Raspberry Pi CM5 (16 GB RAM each) — these are the workhorses, with a mix of 256 GB and 1 TB NVMe drives
  • 3 × Raspberry Pi CM4 (8 GB RAM each) — running 256 GB and 1 TB NVMe as well
  • 3 nodes serve as control plane (1 CM5 + 2 CM4) and 5 as workers (4 CM5 + 1 CM4)

Every node boots directly from NVMe — no SD cards in sight. If you’ve run Raspberry Pis on SD cards for any length of time, you know exactly why that matters. The reliability difference is night and day.

Why Talos?

I was running K3s before this, and there’s nothing wrong with it — K3s is a fantastic way to get into Kubernetes. It’s simple to install, lightweight, and it just works. When I was first learning Kubernetes, it was exactly what I needed.

But as I started managing more OpenShift clusters at work and got deeper into the ecosystem, I think I outgrew it. Or maybe I just wanted more. At work I’m dealing with Red Hat CoreOS (RHCOS) — an immutable, API-driven operating system designed from the ground up for running Kubernetes. Once you get used to that model — where the OS is declaratively managed, locked down, and purpose-built — going back to a traditional Linux distro with K3s on top feels… loose.

Talos Linux is the closest thing to that RHCOS/CoreOS experience you can get outside of OpenShift. There’s no SSH. No shell. No package manager. You manage the entire OS through an API, and the machine configuration is a single YAML document. It felt like a natural fit — and honestly, it was also a challenge I wanted to take on. I geek out on this stuff.

The CM5 Image Challenge

Here’s where it got interesting. Talos doesn’t fully support the Raspberry Pi CM5 yet — they’re close, but official support through the Image Factory isn’t there as of this writing. The CM4 nodes? Easy — you just go to the Image Factory, define a schematic with the extensions you need, and download the image. Done.

For the CM5 modules, I had to build my own Talos image from scratch. The problem? The Talos image builder only has basic support for the Raspberry Pi platform — it doesn’t handle booting from NVMe, and CM5 support is completely missing. I needed something that would boot directly from NVMe and work reliably with the CM5’s hardware.

I started by trying Talos 1.12.4, building a few different images with both the Talos kernel patches and the upstream Raspberry Pi kernel (6.18y, which wasn’t stable yet). No matter what combination I tried, the network would drop off periodically and make the node completely unresponsive, requiring a hard poweroff. Not good.

After wrestling with those issues, I took a step back and built from Talos 1.11.6 instead. That proved to be the turning point. I created my own pipeline using Talos’s official metal image build process, but with custom patches at various layers to get CM5 support and NVMe boot working properly.

The end result is talos-builder — a GitHub repo that produces custom Talos metal images with CM5 kernel patches and system extensions (like the iSCSI tools for Longhorn) baked right in. I also decided to build matching images for both CM4 and CM5 nodes, so they’re all running the same Talos version and configuration. Consistency matters when you’re managing a heterogeneous cluster.

Once the upstream Talos project and Image Factory fully support the RPi5/CM5 and NVMe boot, I’ll gladly retire that repo. But for now, it’s what keeps my CM5 nodes running.

From Flash to Running — Automated Provisioning

One of the things I’m most proud of is the provisioning workflow. Once I flash a Talos image onto a node’s NVMe drive and it boots into maintenance mode, the Ansible playbooks take it from there. They generate the Talos machine configs — with per-node customizations like hostnames, network settings, and role assignments — apply them, bootstrap the cluster, and then deploy every workload on top. Going from a freshly flashed node to a fully functioning cluster member is almost entirely hands-off.

What’s Running on It

The whole cluster is managed through Ansible — I have a set of numbered playbooks that deploy everything from scratch. Here’s what’s currently running:

  • Traefik — Ingress controller and reverse proxy, handling all the routing
  • Kubernetes Gateway API — Next-gen ingress, because I like playing with the new stuff
  • Cloudflared — Cloudflare Tunnel for secure external access with zero exposed ports
  • GitHub Actions Runner Controller — Self-hosted ARM64 runners for CI/CD right on the cluster
  • Longhorn — Distributed block storage across all the NVMe drives
  • Metrics Server — For resource monitoring and HPA
  • Prometheus + Grafana (kube-prometheus-stack) — Full monitoring with dashboards
  • Headlamp — A clean Kubernetes dashboard UI
  • Kubernetes MCP Server — Lets me manage the cluster through AI tools (yes, really)
  • kube-vip — LoadBalancer provider for bare-metal services
  • WordPress — This very site, www.swheetlife.com, running in a container on the cluster

Yeah — the blog you’re reading right now is hosted on this Raspberry Pi cluster. It’s running behind Cloudflare with all connectivity tunneled through Cloudflared, so there are no ports exposed on my home network. I think that’s kind of cool.

The Ansible Approach

Everything in this cluster is deployed and managed through Ansible playbooks. I went this route because I wanted the entire stack to be reproducible — if a node dies or I want to rebuild from scratch, I can run the playbooks and have everything back up without hunting through shell history or bookmarked docs.

Secrets are managed through Infisical, which acts as my secret store and vault. API keys, tokens, certificates — nothing is hardcoded in the repo. The Ansible playbooks pull secrets from Infisical at deploy time, so the automation stays clean and the sensitive stuff stays out of version control.

The initial Talos config generation, cluster bootstrap, and every workload deployment is codified. It’s the same GitOps-flavored thinking I use at work, just scaled down to a home lab.

What’s Next

I’ll be writing separate posts diving into each of the roles and workloads — how Longhorn works on Talos, the Cloudflare tunnel setup, running self-hosted GitHub Actions runners on ARM64, the whole monitoring stack, and more. There’s a lot to unpack and each one deserves its own write-up.

For now, I’m just really happy with where this cluster is at. It’s a genuine production-grade Kubernetes environment running on a stack of tiny compute modules with NVMe storage, and it handles everything I throw at it. If you’re running K3s on Pis and thinking about something more — Talos might be your next move.