A few years back a friend and I got access to a rather new server that we decided to load linux on and host a few services for ourselves. We started out with a few simple services like a forum and Matrix, but it quickly escalated into a cluster of 4 servers.

The goal of the homelab

For the most part the entire exercise has been about learning and trying out different technologies. I’ve been quite interested in containerization for a decade now but have not had many chances to use them professionally so I decided quite early that this homelab would be built around Kubernetes.

For me there were a few criterias that I needed to adhere to in order to stay interested in maintaining the server.

It had to provide value to me and my closes friends.
It had to help us untangle ourselves from centralized services
It had to atleast have some manageability for some friends with less experience with Linux.
It had to serve as a way for me to experiment with networking and networking gear without staying purely theoretical.

The first iteration

The first iteration of our homelab was a single Dell PowerEdge R510 rack server. Which served us well in terms of providing value for ourselves. We started out running Rocky Linux on the server and installing k3s. But this locked us out of running any kind of virtualization without doing the entire job manually. We quickly pvioted away from this approach and installed Proxmox on the server in order to allow for multiple vms on the machine, so we could more easilly spin up test boxes and simple places for friends to spin up test boxes in order to muck about with different Linux distributions.

Adding more hardware

After some time, we got a 4-year-old Lenovo rack server for free, through some family connections. This machine was almost brand new! It was still under warranty, so needless to say it was a big step up from our R510. We installed Proxmox on this too and set out to make a two node kubernetes cluster, by running two control plane nodes on the new machine and one on the other.

The second iteration

We ran the first cluster for about two years before I got restless and needed to redo everything. We started by purchasing another server. Well technically we purchased 4. For a measly 200€ we were able to snag another rack server, but this was a 4 blade Supermicro server. This would allow us to add a 3rd node into our cluster which we essentially did. At this point we decommissioned the R510, at least for a while.

The new cluster now consisted of 2 of my four Super micro servers as well as the lenovo. I had also purchased some new Unifi networking gear, namely the Dream Machine Pro and a few of the 5 port switches for cheap of some local classifieds and I set out to finally get working on networking in the cluster. We bought a few more hard drives and set up a new cluster. This time we had 3 nodes and could cluster Proxmox.

Storage-issues

We had been using Kubernetes up to this point, but we felt we couldn’t really utilize a lot of the fun scheduling features of Kubernetes. We had been using the k3s built in Local path storage provisioner. This is essentially just a provisioner that mounts folders to the containers. This is inherently not high availability, so services that required storage in the cluster were inherently forced locked to a single node.

This time around we had 3 nodes which opened a few options to us. It was our belief at the time that using Ceph for this would allow us to lose a single node and still be operational. This did not end up being the case, but we installed the rook operator into our cluster and decided to allocate most of the storage on all nodes to this cluster.

This solution worked out well for us until we later on decided to move the ceph cluster out of the kubernetes cluster and into Proxmox which both required us to set the cluster up from scratch, but also decreased performance.

Ceph as a VM disk storage medium. (The start of our 3rd iteration)

Ceph is a really cool technology but it is quite finicky to make work, especially in smaller clusters. I have heard that uniform storage hardware and high speed networking will alleviate a lot of the issues we encountered.

We had been storing VM drives on their respective nodes so far. We wanted to see if we could make us able to lose a node without losing services. So we looked into it and found out that we should in theory be able to live migrate vms quite fast between nodes in the case one node went down. In order to pursuit this need we decided to try to move our ceph cluster outside of the VMs and into Proxmox in order to let Proxmox leverage ceph for VM hard drives.

Ceph did not work out for us.

We found Proxmox managed ceph to be kind of a pain. services would die randomly, we would get abysmal speed and storage would hang quite a lot. Iteration 2 of our cluster had our kubernetes VMS on local drives, but would share application data only over the network. This probably masked some of the network/hardware issues we experienced with our Proxmox managed Ceph Cluster, as now ALL of our files in ALL vms were backed by slow network storage.

Time for better networking

So, I went back to the classifieds and found some new networking gear. We ended up buying a 40GB QSFP+ Mellanox SX6036, and accompanying NICs for all of our machines. All of our machines now have a 1GBit internet facing network, as well as 40GBit “private network” for inter cluster communication.

This did not alleviate our Ceph issues as we theorized, so at this point we decided to just pivot away from Ceph as a backend for our data and go back to locally stored VMs backed by network storage.

The R510 was recommissioned as a pure NFS file serving server. We bought some new drives and the NFS storage is now up to 12 terrabytes, I assume we need another set this year. We needed this server to store the application data that was in the Kubernetes cluster, which was also backed by the same Ceph cluster as our VM storage.

We shut down all services, and spawned jobs for each of our PVCs to migrate data over NFS to the R510. Then promptly reformatted all drives, setting up a new Proxmox cluster. One of the upsides to our kubernetes workflow was that ALL of our secrets, configurations and deployments were already a bunch of yaml files. Restoring the entire cluster would be quick and dirty.

Segregating resources

Another change we did at this point was to try to segregate our network a bit better. We started utilizing VLANs to segregate network traffic better. To this day DHCP and routing on the 1GB Network is managed by my unify equipment.

We have 3 VLANs that are associated with the cluster.

Management VLAN.
Host VLAN
VM VLAN

We decided to set up IPMI/IDRAC on all of our machines this time around so we didnt have to spend that much time in my basement when dealing with the servers.

The IBM machine is entirely manageable from a web ui, the Supermicro server require outdated web tech so some features are not present on the web interface but is available using ipmitool. This lets us connect to a linux shell over telnet or ssh depending on the machines so even without networking to the box we can control it. All traffic from any of the other VLANs is dropped.

The host VM is obvious. This is merely all of our machines, they all serve their control planes only on their management VLAN interface, so VMs should in theory not be able to access any proxmox control planes as well as not SSH into the NFS box.

The 3rd vlan is just for VM hosts as well as Metal LB which lets us assign virtual ip addresses to services in the Kubernetes cluster. This simplifies routing a lot. VMs are assigned IPS in the 192.168.50.0/25 subnet, while Metal LB ips are assigned by kubernetes in the 192.168.50.128/25 subnet.

Longhorn

We had already purchased drives for all of our application runners(Lenovo and Supermicro servers), so we decided to skip on Ceph 3rd time around and instead go for Longhorn. Even though the longhorn service does not use the 40 GBit network yet, it is still vastly faster than Ceph on a small cluster. Now most of our less important storage is done over Longhorn, while bigger files and “important stuff” is stored on the NFS server. It also allows us to schedule everything on every node.

This has been a great success so far. It has worked flawlessly, though it is a bit slow to consolidate after a node comes back up after being offline. This is because our Kubernetes network still only uses 1GB networking for communication between pods in the kubernetes cluster. At one point we need to add multi nic networking for pods, at least for our longhorn pods so we can leverage our higher speed nics for data duplication across the Longhorn storage pool.

The current setup

This is pretty much our current setup. 4 machines, one storage server, 3 application runners running Kubernetes and longhorn. This setup works quite well. We have a few more changes to do, which I will outline later.

My homelab