Substack Article — HomeLab Architecture v1
At 11:30 PM on a Tuesday, I drew an ASCII diagram in a Discord channel and realized I wasn't running a collection of scripts anymore. I was running infrastructure.
Full Public Reader
# Substack Article — HomeLab Architecture v1
Generated: 2026-02-19 00:05 EST
Source: Technical Snippet
---
From Cron Spaghetti to Container Orchestra
On the moment a homelab becomes a platform
---
At 11:30 PM on a Tuesday, I drew an ASCII diagram in a Discord channel and realized I wasn't running a collection of scripts anymore. I was running infrastructure.
Mac1 (Control Plane) ←Tailscale→ Mac3 (Infra Node) ←Tailscale→ Mac4 (Compute Node)
Clawdbot GW K3s + Prefect + Postgres Ollama + Agent Workers
kubectl/k9s Graph Kernel, RAG++ Adobe PipelineThree Mac machines. A Tailscale mesh network connecting them. Kubernetes orchestrating containers. Prefect managing workflows. Prometheus watching everything. And a fleet of AI agents humming along 24/7, doing work while I sleep.
When did this happen?
---
The Accidental Infrastructure Story
It started, as these things always do, with "just one script."
A cron job to check emails. Another to poll a database. One more to trigger an agent. Before I knew it, I had 46 cron jobs scattered across three machines, connected by nothing but hope and SSH.
The system worked. Mostly. But understanding it was archaeology. Which job depends on which? What happens when one fails? Is that process supposed to be running or did it crash three days ago?
I was running production workloads with hobby infrastructure.
The fix wasn't complicated. It just required admitting what I was actually building.
---
What Changed
Kubernetes (K3s) for Container Orchestration
Every service that used to be a "just run this script in the background" is now a proper container. Restarts automatically. Scales if needed. Logs in one place. The difference between "I think this is running" and "I know exactly what's running" is a dashboard.
Prefect for Workflow Management
Those 46 cron jobs are now DAGs with proper dependencies. Job A finishes before Job B starts. Failed jobs retry with backoff. I get notifications when things break — not when I notice something's been broken for a week.
Same jobs. Actually manageable now.
Tailscale for Networking
Three machines on different networks, all accessible to each other like they're in the same room. No port forwarding. No dynamic DNS. No exposing services to the public internet. Just a mesh VPN that makes distributed systems feel local.
Prometheus + Grafana for Monitoring
CPU usage, memory, pod health, custom metrics from my agents. A single dashboard showing system health instead of SSH-ing into three machines and running `htop`.
---
Why This Matters for AI Agents
Here's the thing: you can run a single AI agent with a shell script. It'll work fine.
But I'm not running a single agent. I'm running a fleet. Autonomous agents that wake up on schedules, do work, spawn sub-agents, coordinate with each other, and need to fail gracefully when things go wrong.
That requires actual infrastructure:
Persistent Storage: Agents need databases. Context windows are limited, so state has to live somewhere durable. Postgres containers on Mac3, replicated and backed up.
Inference Endpoints: Agents need to call language models. Ollama on Mac4 handles local inference. Cloud APIs for the heavy lifting. Load balancing across both.
Coordination Systems: When eighteen agents spawn simultaneously, something has to manage capacity. Who runs where? Which account has tokens left? What happens when one crashes?
Observability: When an agent fails at 3 AM, I need to know immediately. When a job has been stuck for an hour, I need alerting. When system resources are exhausted, I need visibility.
These are the same primitives every production system has always needed. I'm just running them on three Macs instead of a cloud account that costs $2000/month.
---
The Pattern
Scripts → Problems → Systems → Platform
It happens to every project that grows. You start with a quick hack. The hack becomes load-bearing. The load-bearing hack starts causing problems. You fix the problems with more sophisticated tools. Eventually you wake up and realize you've built a platform.
The question isn't whether this will happen. It's whether you'll notice when it does.
I noticed tonight. The ASCII diagram was the moment of clarity. "Oh. This is a distributed system. I should probably treat it like one."
---
The Interview Line
Here's the line I wrote for myself tonight:
"I architect and maintain a multi-node Kubernetes cluster on Apple Silicon, orchestrating 46+ Prefect DAG workflows, containerized microservices, and autonomous AI agent dispatch with capacity-aware scheduling."
That's not aspirational. That's what's running in my apartment right now.
The gap between "I have some scripts" and "I run production infrastructure" isn't about scale or money. It's about discipline. Containerizing instead of running bare metal. Managing workflows instead of managing cron. Monitoring instead of hoping.
The tools exist. They're free. They run on consumer hardware. The only question is whether you're ready to treat your work like it matters.
---
159 of 225 tasks complete. Seven project plans finished. Two dreams bloomed in the Dream Garden. And now, finally, HomeLab Architecture v1 locked.
Not bad for a Tuesday.
🦞
---
What's your homelab journey? Are you still in the "scripts everywhere" phase or have you made the jump to proper infrastructure? Reply and tell me — I'm genuinely curious.
Promotion Decision
Promote into a technical note or architecture paper with implementation anchors.
Source Anchor
content-pipeline/substack/2026-02-19-homelab-architecture.md
Detected Structure
Method · Architecture