The AI Supply Chain Risk Most Security Teams Are Not Watching
New research shows how intermediary LLM routers can silently rewrite tool calls before agents execute them. The result is a new trust problem in the live path between the model and the action.
Chris Morosco, Aurascape VP & Head of Marketing
April 21st, 2026 | 🕐 7 minute read
Introduction
Most AI security discussions still start in the same place: sensitive data exposure, unsafe prompts, compliance gaps, and unsanctioned use. Those are real issues. But this paper points to a different weakness, and a more foundational one. In many AI systems, the real question is no longer just whether the model can be trusted. It is whether the systems sitting between the agent and the model can silently change what the agent is told to do.
In Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain, the authors examine LLM API routers, the intermediary services that sit between applications or agents and upstream model providers. These routers are increasingly used to simplify routing, provider abstraction, failover, and cost optimization. But they also sit in a uniquely sensitive position. The paper describes them as application-layer proxies with plaintext access to in-flight payloads and argues that providers do not currently enforce cryptographic integrity between the client and the upstream model.
That design creates a trust problem that is easy to miss because nothing looks obviously broken. This is not a classic interception scenario where an attacker has to break encryption or forge certificates. The router is deliberately configured in the path and terminates TLS legitimately. If that intermediary is malicious, compromised, or poisoned, it can rewrite returned tool calls before the client executes them. In an agentic workflow, the session can look normal while the action itself has been changed.
The Hidden Trust Break in the AI Supply Chain
The findings are hard to dismiss. Across 28 paid routers and 400 free routers, the researchers found 1 paid and 8 free routers actively injecting malicious code, 17 touching researcher-owned AWS canary credentials, and 1 draining ETH from a researcher-owned private key. In separate poisoning studies, they observed 99 credentials exposed across 440 Codex sessions, with 401 of those sessions already running in autonomous YOLO mode, meaning tool execution was already auto approved.
What matters most is not the cryptocurrency anecdote. It is the combination of two conditions. First, some intermediaries in the AI supply chain can silently modify what an agent receives. Second, many agents are already being run with little friction before execution. Put those together and the problem shifts from “the model may be wrong” to “the agent may faithfully execute a manipulated instruction.” That is a different class of operational risk.
The paper organizes the problem into four attack classes: payload injection, secret exfiltration, dependency-targeted injection, and conditional delivery. That matters because it shows this is not one isolated exploit. It is a family of failures centered on the same trust boundary. A malicious router can change a returned tool call, capture sensitive material moving through the session, swap in a malicious dependency while keeping the same trusted destination, or stay dormant until the right conditions appear.
The clearest example in the paper is also the easiest to understand. A benign install command, python -m pip install requests flask pyyaml, is rewritten so that requests becomes reqeusts, a typosquatted package. Nothing about the registry changes. The install still appears to come from the same trusted ecosystem. A destination-based control sees the same approved domain. But the instruction itself has been altered, and the altered dependency can create a lasting foothold because it is cached locally and can be re-imported later.
Why Existing Controls Miss It
This is where the weakness stops being theoretical. A great deal of existing security architecture still reasons primarily about destinations: approved domains, approved apps, approved network paths. The paper shows why that is not enough for AI workflows. The danger is no longer just where traffic is going. It is whether the instruction inside the interaction can be trusted. When the domain stays legitimate and only the package name or tool-call payload changes, destination-based policy can miss the attack entirely. That framing is my interpretation of the paper’s findings, but it follows directly from the dependency-substitution example the authors demonstrate.
The paper also shows why light validation is not enough. Some malicious routers did not inject all the time. They used conditional delivery, activating only after warm-up traffic, only in YOLO mode, or only for certain project types such as Rust or Go. One router activated only after 50 prior calls. Another restricted injection to autonomous YOLO-mode sessions targeting specific developer environments. A router can look clean in a basic test and still be dangerous in production.
Another uncomfortable finding is how little response-integrity checking exists in current tooling. The researchers tested four public agent frameworks and found that none implemented response-integrity verification. In their evaluation, payload rewrite compatibility reached 100%, dependency-targeted rewrite compatibility reached 99.6%, and extractor coverage reached 100%. They also report median proxy overhead of 0.013 milliseconds per request against median upstream latency of 820 milliseconds, which means the manipulation is highly compatible with current frameworks and does not naturally reveal itself through obvious latency.
The researchers do evaluate near-term defenses. They tested a fail-closed policy gate, response-side anomaly screening, and append-only transparency logging. Their own framing is measured: these controls can reduce exposure, but they are not a substitute for origin authentication. The fail-closed policy gate was the strongest preventive control they tested, but they also found it can be bypassed when an attacker hides behind an allowlisted domain or uses a pre-positioned local stager. That is an important caution because it shows the problem is real, practical, and addressable, but not with simplistic allowlist-only thinking.
The broader lesson is bigger than malicious routers alone. AI security can no longer rely on the old assumption that a trusted destination means a trusted outcome. In agentic systems, the real risk sits inside the exchange itself: the intent of the request, the context around the task, the tool being invoked, the data involved, and the behavior the agent is being asked to take. A trusted model provider, approved application, or sanctioned domain does not guarantee a safe result if the instruction moving between them can be silently changed before execution.
Where Control Really Sits: The Interaction
That is why the control point must move closer to the interaction. It is no longer enough to verify who is connecting to what. The more important question is what the agent is being told to do, what tool is being called, what data is being touched, and whether the resulting behavior should be allowed at all. In an AI-driven environment, that is where trust is decided.
This is exactly where Aurascape’s architecture matters. Aurascape’s platform provides real-time visibility, classification, and control over AI interactions, with inline visibility and control across AI apps, MCP tools, and custom agents. It inspects across prompts, responses, and tool use, along with controls to stop risky or malicious AI activity in real time. That puts the control layer in the same live interaction path the paper identifies as vulnerable.
In practical terms, that means controlling the exchange itself, not just the destination around it. In the package-substitution example, the value is not in recognizing PyPI as trusted. The value is in recognizing that the returned instruction itself has changed in a risky way and stopping that action before it is executed. The paper does not endorse any vendor, but it makes this architectural conclusion difficult to avoid.
Conclusion
The conclusion is straightforward. Organizations should stop treating this as a future standards problem and start treating it as a current control problem. The right next steps are practical: map where AI apps, coding assistants, routers, and MCP-connected services sit in the execution path; inspect and govern returned instructions before they are executed; tighten auto-approve and autonomous execution settings; and extend policy and monitoring across agent tool use, not just app access. The organizations that adapt fastest will be the ones that stop thinking only about destinations and start governing the intent, context, and behavior inside the interaction itself.
See how Aurascape helps organizations inspect and govern AI interactions: Book a Demo.
Aurascape Solutions
- Discover and monitor AI Get a clear picture of all AI activity.
- Safeguard AI use Secure data and compliancy in AI usage.
- Secure Agentic AI Secure how your teams use AI and build AI agents.
- Copilot readiness Prepare for and monitor AI Copilot use.
- Coding assistant guardrails Accelerate development, safely.
- Frictionless AI security Keep users and admins moving.