Skip to main content

Running a GPU-Accelerated K3s Cluster at Home

Building a homelab with Proxmox, K3s, and GPU passthrough for self-hosted AI.

October 5, 2024
15 min read
KubernetesHomelabAI

The Kosmos Project

I've been building a homelab infrastructure called Kosmos. The goal: run a production-grade Kubernetes cluster at home with GPU acceleration for AI workloads.

Hardware Setup

Node 1 (pve-node1)

  • Intel i9-13900K
  • 64GB RAM
  • NVIDIA RTX 4090
  • ZFS storage
  • Node 2 (pve-node2)

  • Intel i7-7700K
  • 32GB RAM
  • NVIDIA GTX 1070
  • LVM storage
  • Software Stack

    Proxmox (Layer 2)

  • Virtualization layer
  • GPU passthrough via IOMMU
  • Cluster for high availability
  • K3s (Layer 4)

  • Lightweight Kubernetes
  • Installed via Pulumi
  • Director + agent architecture
  • Workloads (Layer 5)

  • Media: Plex, *arr stack, qBittorrent
  • AI: Ollama, Open WebUI
  • Infrastructure: Traefik, Authentik, Uptime Kuma
  • GPU Passthrough

    The key to running AI workloads is GPU passthrough:

  • Enable IOMMU in BIOS
  • Configure GRUB for IOMMU
  • Pass GPU PCI ID to VM
  • Install NVIDIA drivers in VM
  • Use nvidia-device-plugin in K8s
  • # GPU workload example
    spec:
      containers:
      - name: ollama
        resources:
          limits:
            nvidia.com/gpu: 1
      nodeSelector:
        nvidia.com/gpu.product: NVIDIA-GeForce-RTX-4090

    Infrastructure as Code

    Everything is managed with:

  • Pulumi (TypeScript) for K8s resources
  • Ansible for Proxmox configuration
  • GitHub Actions for CI/CD
  • Results

    I now have:

  • Self-hosted LLM inference (Ollama)
  • Hardware-accelerated media transcoding
  • SSO for all services (Authentik)
  • Monitoring and alerting (Uptime Kuma)
  • The project demonstrates how Infrastructure as Code can manage complex homelab setups.