Your AI Agent Needs a Job Description — and Kubernetes Is HR
There’s a version of the AI agent story that goes like this: you write a clever prompt, chain a few tool calls together, demo it to stakeholders, everyone applauds, and then it quietly dies in a Jupyter notebook somewhere.
That version is extremely common right now.
The reason isn’t that AI agents don’t work. It’s that most teams treat them like prototypes when they’re actually runtime systems — things that need to scale, crash gracefully, stay within permission boundaries, and leave an audit trail. In other words: things that need Kubernetes.
This isn’t a hot take. It’s basic software engineering applied to a new class of workload. And the industry is finally catching up.
The Real Problem: Agents Are Services, Not Scripts
Here’s a thought experiment. An AI agent that automates incident response might query your monitoring stack, correlate logs, open a Jira ticket, page an on-call engineer, and — if configured to — roll back a deployment. That’s not a chatbot. That’s a stateful, multi-step, consequential workflow running against production systems.
Now ask yourself: if that agent gets 10x the requests during a Monday morning incident spike, what happens? If it crashes mid-task, does it retry safely or corrupt state? If it starts accessing systems it shouldn’t, who stops it?
These are not AI questions. They’re distributed systems questions. And Balaji Palanisamy, a Cloud Platform Engineer and SRE who works at the intersection of platform and AI engineering, put it cleanly in a recent post that’s been circulating in the platform engineering community:
“AI agents are not just ‘smart prompts.’ They are runtime systems.”
Once you accept that framing, the next question answers itself: where do we run runtime systems at scale? Kubernetes.
What Kubernetes Already Solves (That Your Agent Needs)
Kubernetes has spent a decade becoming very good at exactly the problems AI agents surface in production. Here’s the direct mapping:
Scalability: Agents don’t get steady, predictable traffic. Some run occasionally; others burst. Kubernetes’ Horizontal Pod Autoscaler (HPA) was built for exactly this pattern — scale when demand spikes, scale down when it doesn’t.
Isolation: If you’re running multiple agent types — one that handles customer support, one that writes code, one that manages deployments — you probably don’t want them sharing network access, secrets, or resource pools. Kubernetes namespaces give each agent type its own sandbox, with RBAC controlling what it can touch.
Observability: An AI agent calling the wrong tool at 2am is much harder to debug than a microservice throwing a 500 error — unless you’ve wired up proper tracing. Kubernetes integrates cleanly with Prometheus and OpenTelemetry, letting you track tool call latency, failure rates, retry counts, and policy decisions. Without this, your agents are black boxes.
Governance: This is the one most teams skip until something goes wrong. OPA (Open Policy Agent) and Kyverno can enforce what agents are and aren’t allowed to do at the infrastructure level — before a bad configuration causes a bad outcome. Audit logs mean you know exactly what any agent did, and when.
Balaji’s mental model is the cleanest summary of this architecture: “The model provides intelligence. The agent framework provides reasoning and orchestration. Kubernetes provides the production runtime discipline.” You need all three. Most teams are building one-and-a-half.
The Platform Is Evolving to Meet AI Agents Halfway
The timing here matters. Kubernetes v1.35 — codenamed “Timbernetes” by the community — landed in early 2026 with a set of changes that read like an AI infrastructure wish list.
Gang scheduling (alpha): AI training and inference workloads often require all their pods to start simultaneously or not at all. Partial placement wastes GPU capacity and produces broken training runs. Gang scheduling enforces “all-or-nothing” pod placement.
In-place pod resize (stable): Previously, if you needed more CPU or memory for an inference pod, you’d restart it. Now you can resize a running container without disruption — critical when you’re serving live requests and need to scale up fast.
Dynamic Resource Allocation (DRA): More predictable GPU and accelerator claiming means the cluster makes smarter placement decisions for compute-hungry workloads.
On the tooling side, Dapr — a CNCF project that runs as a sidecar alongside your application pods — now offers a Conversation Building Block that abstracts LLM providers behind a unified interface. Want to switch your agent from GPT-4 to Claude without touching business logic? You update a YAML component definition. The Dapr runtime handles retries, API quirks, and observability automatically. Your agent code stays clean.
The broader CNCF ecosystem — now hosting 230+ projects with over 300,000 contributors worldwide — already provides production-grade solutions for secrets management, service mesh, storage, and policy. When you run your AI agents on Kubernetes, you inherit a decade of battle-tested infrastructure that most teams would spend years rebuilding from scratch.
The Warning Sign You Shouldn’t Ignore
Gartner projects that more than 40% of agentic AI projects will be canceled by end of 2027 — not because the AI didn’t work, but because of escalating costs, unclear business value, and inadequate risk controls.
That’s a striking number. And the risk controls part is key.
An agent that runs without governance isn’t just technically risky — it’s politically risky. One incident involving an agent accessing data it shouldn’t, or triggering a workflow it wasn’t supposed to, can set back an entire organization’s AI program. The teams that survive to 2027 with their projects intact won’t be the ones who built the cleverest agent logic. They’ll be the ones who treated agents like proper software: namespaced, RBAC’d, monitored, and auditable.
The practical path isn’t complicated. Containerize your agents. Give each type its own namespace and a tightly scoped service account. Hook your existing observability stack up to agent execution events. Use Kyverno or OPA to enforce what agents can and cannot do. Add Dapr for LLM provider abstraction and built-in retry logic. You don’t need a new platform for any of this — you need to use the one you already have.
Takeaway
The teams that will still be running AI agents in production in 2028 are the ones treating those agents exactly like the distributed services they are — right now, with Kubernetes as the operational backbone.
Sources: LinkedIn post by Balaji Palanisamy (March 2026); CNCF Blog — “Kubernetes as AI’s OS: 1.35 release signals” (Feb 23, 2026); CNCF Blog — “Conversing with LLMs using Dapr” (Feb 4, 2026); CNCF Annual Report 2025; Gartner analyst research.