AI Agents Need the Inverse of Human Trust

I’ve been thinking about agent identity and authorization for a while now, and it’s gradually getting clearer to me. AI agents create intent instead of forwarding it, and that means delegation becomes abdication when the agent makes the decisions that matter.
Part of the solution is extending existing auth patterns and building new ones for cross-domain trust. But I feel the root of the tension is something more fundamental. Agents need the inverse of how we manage humans.
Organizations Are Modeled Around Humans
Organizations and their technology are designed to minimize constraints on people. We don’t list everything an employee shouldn’t do. We give them a role, adequate boundaries, and rely on them to use judgment within those.
Partly this is practical: micromanagement doesn’t scale. You can’t write rules for every situation. We hire professionals and trust their judgment.
But it’s also psychological. The entire evolution of HR has been toward more autonomy, not less. We learned that trusted employees perform better. Excessive control backfires: it kills motivation, creates adversarial dynamics, drives away talent.
And it works because humans care. We care about doing good work, about our reputation, about consequences. We have common sense. That creates an internal constraint: you self-regulate because it matters to you personally.
So we default to trust. We anticipate where it matters and is achievable, and adjust as we go. It’s a balance.
Agents Are Different
An agent sounds human because we want it to. But it’s a statistical machine, no smarter than the data it’s trained on, and not even consistent in capturing that. It slips unpredictably, and with current architectures, it always will.
We can’t trust an agent the same way we trust a human. It’s not a matter of incentives or certifications. An agent fails unpredictably, and it doesn’t know when it’s wrong.
Cybersecurity has a useful lens here: risk = impact × likelihood. An agent scales up both. More actions means more chances to slip, and when it goes wrong, it can go fast and wide.
None of this means we shouldn’t deploy agents. After all, we put everything on the internet despite the massively increased attack surface, because the value is worth it. But it means we need a different approach.
We could try internal reward systems, but when the model slips, its judgment slips too. There’s no internal brake that stays intact when things go wrong.
That leaves external controls, and that’s how most implementations approach it today. But given the open-endedness of agents, there’s an infinite number of things an agent shouldn’t do, and any blocklist we write is incomplete by definition. How do you hand over the keys and sleep at night? Clawdbot showed us what happens when you don’t think this through.
The Inversion
With humans we can trust common sense. With agents we can’t, because an agent has none in any true sense. When we can’t trust common sense, we need to verify ruthlessly. And set controls accordingly.
Humans are restricted in what they can't do. AI agents must be restricted to what they can, for each task.
These permissions need to be granular, situational, auditable, and structural, not advisory.
And this isn’t just a technical problem. Organizations need to rethink how they encode authority, knowing they need to anticipate random failure. What an agent can access, under which conditions, up to what threshold: all of it needs to be explicit, because it won’t fill in the blanks the way a person would.
It’s never finished: we keep improving as we discover where risk management falls short.
Better Models Won’t Solve Governance
This is why we keep conflating two problems.
Context engineering increases reliability: whether the model does what you intended. Better models, better context, verification loops. This is getting solved. With each model upgrade, teams delete scaffolding. The model outgrows the workarounds.
Governance manages risk: whether the agent is allowed to do what it’s about to do. This doesn’t get solved by better models. An agent will keep messing up. Frequency might decrease, but there’s no guarantee the impact does.
In fact, as reliability improves, the risk might grow. When an agent gets things right 99% of the time, we stop watching. Don Norman describes this for automation:
Over fifty years of studies show that even highly trained people are unable to monitor situations for long periods and then rapidly take effective control when needed.
The same applies here. As agents become more reliable, complacency sets in and guardrails get relaxed.
Human in the loop is not a reliable safety net.
So for reliability, the evolution is less harness over time. For governance, it’s the opposite: we can’t lift the guardrails. We’ve never handed the keys to something inherently unstable before, and current solutions aren’t built for that.
The building blocks exist: identity, authentication, fine-grained authorization. But they need to be rethought for actors that take unanticipated paths, cross trust domains, and hand off to other agents. We don’t know where they’ll end up, so everything needs to travel with the context, verifiable at every step.
And it goes both ways. Incoming requests can’t be taken for granted either: shadow AI, unauthorized agents, spoofed identity. We need to know if we’re dealing with a human or an agent, and what it was allowed to do.
The Path Forward
The implementations that make it from POC to production won’t just be the ones with better models. They’ll also be the ones that designed for a different kind of actor: one that requires the inverse of human trust. Not smarter agents, but safer ones. You can’t go wrong anticipating for it in your organization’s infrastructure.
Trust infrastructure won’t be a layer within a solution. It needs to become the fabric.
I publish explainers on the protocols shaping agent trust, and write about the gaps between where we are and where we need to be. Follow along if this matters to you.