
Agentic AI is here, and it’s moving fast. Most trust infrastructure isn’t ready to govern agents that act autonomously across systems and organizations. The closest thing we have is the EU AI Act, the world’s first comprehensive AI regulation, now going into effect. But it was written before LLMs took the world by storm.
What follows is a best effort exploration from a technical perspective. Involve legal experts before acting on any of this.
The EU AI Act, a Short Refresher
The EU AI Act entered into force in August 2024, with provisions rolling out in phases through 2027. It takes a risk-based approach: the higher the risk of your AI system, the stricter the obligations. Like GDPR, the reach is extraterritorial: if your AI system’s output is used in the European Union, you’re in scope.
The Act doesn't regulate AI. It regulates high-risk use cases.
— Carme Artigas
The Act sorts AI systems into four risk tiers:
e.g. social scoring, real-time biometric surveillance, subliminal manipulation
e.g. employment, education, credit scoring, law enforcement, critical infrastructure
e.g. chatbots, deepfakes, emotion recognition
Annex III lists what counts as high-risk: employment, education, credit scoring, law enforcement, critical infrastructure, access to essential services. And “putting into service” includes internal use: deploying AI for your own purposes doesn’t make you exempt.
Going back to our question: it turns out the Act doesn’t mention agents. General-purpose models (LLMs) got a last-minute chapter. Agents, which use those models to autonomously plan and act, are a layer the regulation didn’t anticipate.
GPAI, Systemic Risk, and RAG
The Act distinguishes three roles: provider, distributor, and deployer. Where you fall matters.
As mentioned, LLMs forced the EU to extend its draft. The Act now has a dedicated chapter for general-purpose AI models (Article 3(63)), split into two tiers depending on compute used during training:
- GPAI: must publish technical documentation, describe training data, and comply with copyright rules.
- GPAI with systemic risk (Article 51): models trained with 1025+ FLOPs, powerful enough to pose risks across entire sectors or society. Think GPT-4, Claude, Gemini. On top of documentation: adversarial testing, incident reporting, cybersecurity protections.
Being classified as a GPAI provider has significant compliance impact. For agent builders, these obligations sit with model providers, not with you. Using Claude doesn’t make you responsible for Claude’s systemic risks. Since the threshold is based on compute, only substantial fine-tuning could make you a GPAI provider: at least one-third of the original training compute. Most enterprise fine-tuning doesn’t come close. And with LLMs, most customization happens through context engineering anyway.
RAG and prompt engineering don’t count. The July 2025 guidelines clarify that only significant modifications to the model weights trigger provider obligations. RAG, prompt engineering, orchestration, tool-calling frameworks don’t reach that bar. If you’re building an agentic platform with context engineering, you’re a distributor, not a provider.
One wrinkle: open-weight models get a partial pass on transparency obligations, but not if they cross the systemic risk threshold. Frontier open models like DeepSeek R1 and Qwen likely cross that line. If the original provider hasn’t complied with EU obligations, that risk may flow down to you as the first entity in the EU value chain. Something to consider before building on open models with no EU presence.
So at the model layer, the obligations are manageable if you use a compliant provider and don’t intensively fine-tune it. But what about the agent layer on top?
Agents Don’t Fit in a Box
The EU AI Act was written for traditional AI deployments: fixed pipelines with a known use case at build time. An HR screening tool? High-risk from day one. Classify it, file the conformity assessment, move on.
Some agents work the same way: a single clear goal, a known risk tier. But not all of them. What if the goal isn’t that clear? Give a general-purpose office assistant “handle my inbox” and it decides itself to draft an email (minimal risk), screen a job application (high-risk), then assess a customer complaint (potentially high-risk). Suddenly you’ve ended up in a high-risk use case. The risk tier depends on how open-ended the prompt is.
In the first case, each step has a clear scope: the AI transcribes, summarizes, and the result gets emailed with a disclaimer that it’s AI-generated. In the second, the agent doesn’t just generate: it acts on its own output. The summary becomes input for decisions the agent makes autonomously. That’s where risk compounds.
You can classify a tool at build time. You can’t classify an agent whose use case emerges at runtime. Harnesses can constrain this, but how do you constrain something you can’t anticipate?
This is what the Future Society report calls the multi-purpose problem: generic agents default to high-risk classification unless you explicitly exclude high-risk uses. The Act is permissive by design. But agents need closer attention precisely because they’re general-purpose.
The Shadow Agent Problem
Low-code platforms add another layer. When employees build agents on tools like Power Platform or Copilot Studio, the company is still the provider. An employee builds an HR screening agent without a compliance assessment, and the company is non-compliant without knowing the system exists. This is why Article 4 (AI literacy) matters: staff need to understand what makes something high-risk. And compliance teams can’t assume they have visibility into what’s being built. Just like shadow IT before it, shadow agents will be one of the harder governance challenges to solve.
From Compliance to Infrastructure
The Act tells you what to achieve, not how. For high-risk systems: quality management (Art. 17), risk management throughout the lifecycle (Art. 9), record-keeping and traceability (Art. 12), human oversight by design (Art. 14).
None of this works without infrastructure. You need:
- Governance: who can build and deploy agents, what uses are permitted
- Audit: what did the agent do, why, with traceable decision logs
- Authorisation: what is this agent allowed to do right now, in this context
We can extend current systems like On-Behalf-Of delegation, but agents also need emerging patterns for problems like the confused delegate: an agent acting on behalf of a user, but without clear limits on what it’s allowed to do.
This Is an Architecture Problem
The Act’s requirements aren’t abstract. Article 9 demands risk management throughout the lifecycle. Article 12 demands traceability: what the system did, when, and why. Article 14 demands human oversight by design, not as an afterthought. These map directly to infrastructure you either have or don’t.
Risk management means knowing which use cases your agent can reach at runtime and having the governance thresholds to constrain them. Traceability means audit trails that capture the agent’s decision chain, not just its output. Human oversight means delegation models where authority flows downward and can be revoked, not open-ended prompts where the agent decides its own scope.
This isn’t a compliance exercise. It’s an architecture problem. The organisations that treat EU AI Act readiness as a paperwork challenge will find themselves retrofitting infrastructure under pressure. The ones that build the right infrastructure now will find compliance falls out naturally, because the Act’s requirements describe what well-governed agent deployments look like anyway.
Shadow agents make this urgent. Employees are already building agents on low-code platforms without governance oversight. Article 4 requires AI literacy precisely because the company is liable regardless of who built the system. You can’t govern what you can’t see.
This remains a best-effort technical analysis. Involve legal experts for your specific situation. But the technical conclusion is clear: the gap between what agents can do and what the Act requires is an infrastructure gap. Auth, identity, scoping, audit trails, guardrails. Build them now, or build them under regulatory pressure later.
At trustedagentic.ai, I’m building the PAC Framework to make this actionable: where agents create value (Potential), how to trace what they did (Accountability), and how to enforce what governance promises (Control). The Act validates the approach. The work is getting it built.
For further reading, Ahead of the Curve: Governing AI Agents Under the EU AI Act by The Future Society dives deeper into agents under the Act.
Resources