Shane Deconinck Trusted AI Agents · Decentralized Trust

AI Agent Reliability Is Getting Easier. The Hard Part Is Shifting.

A young tree that has outgrown its support stake

An agent harness is everything you build around a model to make it useful: the loop, the tools, the guardrails, the glue code. Think LangChain, CrewAI, AutoGen. It’s where most of the engineering effort goes in building agentic AI today. But how much of your prompt chains, output parsers, retry logic, and routing between sub-agents will still matter in a year?

Claude Code is one of the most applauded AI agents out there. And it turns out, it wasn’t architected from a grand vision. Even at Anthropic, this wasn’t planned from above: engineer Boris Cherny started it as a solo side project, figuring out what the model could do empirically.

In late 2024, Claude could barely generate bash commands. With each model upgrade, the team didn’t need to add more code: they could remove it. By late 2025, Claude Code was writing all of its own code, and Cherny hadn’t written a line manually in months.

Every line of scaffolding is a bet that you know better than the model. And models keep improving.

The Scaffolding Trap

When the model improves, scaffolding doesn’t just become dead weight. It actively fights the model’s new capabilities. The workaround you wrote for a limitation now prevents the model from using the better approach it learned. So you delete it.

This is the scaffolding trap. You build a custom harness to work around a model’s limitations. The model improves. Now your harness is the bottleneck. You’re maintaining code that exists to solve a problem that no longer exists.

Don't invest in scaffolding the model will outgrow. Invest in context and permissions.

Claude Code’s architecture reflects this lesson. A single loop, a handful of basic tools, no multi-agent orchestration. Anthropic’s engineering blog puts it simply: “do the simplest thing that works.”

Invest in Context, Not Control

The Claude Code team writes context to prevent mistakes, not code. Each team at Anthropic maintains a CLAUDE.md file checked into git. As Boris explains: “Anytime we see Claude do something incorrectly we add it to the CLAUDE.md, so Claude knows not to do it next time.”

When you’d normally write a linter rule or a validation check, they write a sentence. Context is cheap to update and doesn’t create maintenance burden. It degrades gracefully: if a model outgrows an instruction, the instruction just stops mattering.

Notice what Claude Code doesn’t use: no vector databases, no embeddings. Just raw files and regex search. As models get better at reasoning over raw sources, the value of heavy preprocessing decreases.

What matters more is getting the right information into context at the right time: structured knowledge, clean source data, and the retrieval logic to connect them. Knowledge graphs and deterministic retrieval are more durable investments than similarity search. They compound in value as models improve. Custom orchestration logic becomes a liability.

Where the Hard Problems Actually Are

As harnesses get thinner and models get stronger, the engineering bottleneck shifts. Claude Code’s most complex component isn’t any AI logic: it’s the permissions system.

Making agents smart is becoming commodity with the right context. The harder challenge is making them trusted and secure: identity, authentication, fine-grained authorization, cross-domain trust, and secure-by-default deployment.

The Clawdbot/OpenClaw chaos is a preview of what happens when capable agents get unleashed without that infrastructure.

If you have a big budget and pressing production needs, a custom harness makes sense today. If you don’t, invest in context and anticipate the trust and security challenges ahead. Either way, that’s where the bottleneck is heading.