I would like to write about my homelab migration to k3s here in this blog. If you are not familiar with k3s, I highly recommend the excellent documentation at https://docs.k3s.io.

k3s is very easy to install initially and is very well suited for my homelab, allowing me to continue working with Kubernetes, learn new things, and deepen my existing knowledge.

This is the architecture diagram that k3s shows on its own website:

how-it-works-k3s-revised

I will start with a node called fuji1 at my home, simply because it is a Fujitsu Futro S920. These can be purchased very cheaply second-hand and are easy to upgrade for your own home lab. In future blog posts, I will describe the hardware upgrades I have made and the software adjustments I have implemented.

System specifications:

  • Operating system: Ubuntu 24.04.2 LTS Noble Numbat
  • Hardware: AMD GX-222GC SOC (2 cores @2.2GHZ, 16 GB RAM
  • Storage: 100 GB total capacity

Phase 1

Since I had previously been running various Docker containers on the system, I wanted to clean it up completely and run them in Kubernetes later.

# Stop all running containers
sudo docker stop $(sudo docker ps -aq)

# Disable Docker services
sudo systemctl stop docker containerd

# Completely remove Docker packages, including configuration files
sudo apt-get purge -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

# Delete remaining Docker directories
sudo rm -rf /var/lib/docker /var/lib/containerd/ /etc/docker ~/.docker

Phase 2 The first installation attempt and troubleshooting

I started the k3s installation with a clean system. The strategy was to use a robust configuration from the outset, designed for future expandability and high availability (using etcd) and the use of enterprise-grade components. That’s why I disabled servicelb and traefik, as these do not meet enterprise requirements and I wanted to use something else afterwards.

curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="server --cluster-init --disable traefik --disable servicelb --write-kubeconfig-mode=644 --protect-kernel-defaults" sh -

After the installation came the first disappointment. That try, to check the cluster state was not successfull.

kubectl get nodes
The connection to the server localhost:8080 was refused - did you specify the right host or port?

Although k3s.service was running, the Kubernetes API server was not accessible. The logs showed various errors and timeouts, indicating a fundamental problem with the startup of the k3s server. The cause was a chicken-and-egg problem. The --protect-kernel-defaults parameter ensures that k3s only starts if certain kernel parameters are set. However, these had not yet been configured, which prevented the start.

Phase 3 The solution – a clean restart in the correct order

The solution was to completely restart the process in the correct order.

Step 1

Complete uninstallation of k3s

First, the faulty k3s installation was completely removed using the official uninstall script.

sudo /usr/local/bin/k3s-uninstall.sh

Step 2

System hardening (kernel parameters)

Before reinstalling, the kernel parameters recommended by the CIS Hardening Guide were set. These increase the stability and security of the system by defining the behavior in case of memory shortage (OOM killer) and system errors (kernel panic).

Digression

What? Why? CI… Who?

Here’s an explanation

Where does it come from?

The Center for Internet Security (CIS) is a globally recognized, non-profit organization. Its main task is to develop and promote best practices and configuration guidelines for securing IT systems and data. These guidelines are known as “CIS Benchmarks.”

What is the CIS Kubernetes Benchmark?

This is a comprehensive document developed by a global community of cybersecurity experts. It contains a detailed list of configuration settings to secure (“harden”) a Kubernetes cluster. This benchmark is the de facto industry standard for Kubernetes security. Organizations such as the US NSA, Microsoft (for Azure AKS), and Google (for GKE) use and recommend these benchmarks.

https://media.defense.gov/2022/Aug/29/2003066362/-1/-1/0/CTR_KUBERNETES_HARDENING_GUIDANCE_1.2_20220829.PDF

What is the k3s Hardening Guide?

The official k3s Hardening Guide is basically the “translation” of the general CIS Kubernetes benchmark specifically for k3s. It describes exactly which steps and configurations are necessary to make a k3s installation compliant with the CIS recommendations.

https://docs.k3s.io/security/hardening-guide

In summary:

CIS: The global authority that defines security rules.

CIS Kubernetes Benchmark: The “rule book” for hardening Kubernetes.

k3s Hardening Guide: The “instructions” on how to implement the rules from the book for k3s.

Now continue:

Create configuration file for kernel parameters

sudo tee /etc/sysctl.d/90-k3s.conf <<EOF
vm.panic_on_oom=0
vm.overcommit_memory=1
kernel.panic=10
kernel.panic_on_oops=1
EOF

# Apply new parameters immediately without restarting
sudo sysctl -p /etc/sysctl.d/90-k3s.conf

These parameters are not a k3s-specific invention, but rather basic Linux settings that control the behavior of the server in extreme situations. They are crucial for the stability of any server running critical services.

  • vm.overcommit_memory=1: Allows the kernel to “optimistically” approve memory requests from applications such as Kubernetes. This is important because containers often preemptively request more memory than they immediately need. Without this setting, applications could fail.

  • vm.panic_on_oom=0: Ensures that an “out of memory” (OOM) event does not cause the entire system to crash. Instead, only the one process that is overloading the memory (e.g., a faulty container) is terminated by the OOM killer. This maintains the stability of the overall system.

  • kernel.panic=10: Instructs the server to automatically restart after 10 seconds in the event of a fatal, irrecoverable system error (“kernel panic”) instead of simply “freezing.” This drastically minimizes downtime.

  • kernel.panic_on_oops=1: Forces a safe restart even in the event of minor kernel errors (“oops”) to prevent instability or data corruption. This follows the “fail fast” principle: better a quick, clean restart than unnoticed, unstable continued operation.

I may explain the entries in more detail in another blog post so that they can be better understood in depth, including by myself.

Phase 4 The successful reinstallation

With the correctly prepared kernel parameters, k3s was now reinstalled.

Final installation command

curl -sfL https://get.k3s.io | sudo sh -s - server --cluster-init --disable traefik --disable servicelb --write-kubeconfig-mode=644 --protect-kernel-defaults

The meaning of the parameters in the installation command

  • server Installs k3s as a server node (also known as the control plane)
  • --cluster-init Initializes a new cluster with embedded etcd as data storage, which lays the foundation for future high availability (HA).
  • --disable traefik Disables the default ingress controller of k3s in order to install Nginx later, which is more powerful and more popular in enterprise environments.
  • --disable servicelb Disables the stand-load balancer in order to use MetalLB later for professional IP management in the home network.
  • --write-kubeconfig-mode=644 Creates the kubeconfig file with read permissions for all users, allowing the use of kubectl without sudo.
  • --protect-kernel-defaults A crucial security feature that aborts the startup if the kernel parameters do not comply with CIS hardening recommendations.

Phase 5 Verification and triumph

After installation, the next step was to configure the kubectl client.

# Create .kube directory
mkdir -p ~/.kube

# Copy kubeconfig file from k3s to user directory
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config

# Adjust ownership rights
sudo chown $(id -u):$(id -g) ~/.kube/config

Now for the moment of truth: checking the cluster status:

kubectl get nodes
NAME    STATUS   ROLES                                       AGE       VERSION
fuji1       Ready      control-plane,etcd,master   30s    v1.33.4+k3s1

Perfect! The node was Ready and all roles (control-plane, etcd, master) were active. The mission was successfully completed.

A detailed check confirmed that the cluster was in perfect condition:

  • kbuectl get pods -n kube-system showed all system pods (CoreDNS, Local-Path-Provisioner, Metrics-Server) in Running status.
    kubectl get pods -n kube-system
    NAME                                      READY   STATUS    RESTARTS     AGE
    coredns-64fd4b4794-jhrlg                  1/1     Running   1 (9h ago)   2d20h
    local-path-provisioner-774c6665dc-bd78c   1/1     Running   1 (9h ago)   2d20h
    metrics-server-7bfffcd44-84p7c            1/1     Running   1 (9h ago)   2d20h
  • kubectl get componenstatuses confirmed that controller-manager, scheduler, and etcd-0 are Healthy.
    $ kubectl get componentstatuses
    Warning: v1 ComponentStatus is deprecated in v1.19+
    NAME                 STATUS    MESSAGE   ERROR
    scheduler            Healthy   ok
    controller-manager   Healthy   ok
    etcd-0               Healthy   ok
  • kubectl describe node fuji1 Provides a wealth of information confirming the perfect condition of the node, including the activated security features and the impressively low resource consumption.

By Alex

Leave a Reply

Your email address will not be published. Required fields are marked *