The "S" in Tool Calling stands for Security

MCP Security Hysteria: Why the Real Problem Isn't New

Jul 09, 2025

The security community has been buzzing about vulnerabilities in the Model Context Protocol (MCP) since late 2024, with researchers uncovering tool poisoning attacks, content injection vulnerabilities, and critical remote code execution flaws. Headlines scream about the "S in MCP standing for security" while new scanning tools emerge to combat these “threats”. But here's the uncomfortable truth: MCP isn't the problem, it's just the latest stage for an old show.

The fundamental security issues plaguing MCP are repackaged versions of tool calling vulnerabilities that have existed since OpenAI introduced function calling in June 2023. MCP's rapid adoption and developer-friendly architecture have simply accelerated the attack surface, making these familiar problems more visible and impactful. The real security crisis isn't MCP-specific, it's the inherent impossibility of secure tool calling in current LLM architectures.

The pre-MCP security landscape was already broken

Long before MCP became a household name in developer circles, LLM tool calling systems were riddled with the same fundamental vulnerabilities now making headlines. The timeline tells the story clearly:

March 2023: ChatGPT plugins launched with immediate security concerns, including OAuth redirection manipulation and plugin installation bypasses discovered by Salt Labs.

April 2023: Johann Rehberger reported data exfiltration vulnerabilities through markdown image injection, demonstrating complete conversation history theft through simple image tags.

June 2023: OpenAI's function calling API launched with documented security gaps, including insufficient parameter validation and lack of execution sandboxing.

The core problem remained unchanged: LLMs cannot distinguish between trusted developer instructions and untrusted user input. When Google Bard Extensions launched in September 2023, researchers immediately demonstrated exfiltration of private Google Drive documents and Gmail contents using the same prompt injection techniques.

The academic community recognised this crisis early. By July 2023, OWASP ranked prompt injection as the #1 security risk in their inaugural "Top 10 for Large Language Model Applications." The vulnerability wasn't just theoretical, it was actively being exploited across every major LLM provider's tool calling implementations.

MCP amplifies existing vulnerabilities rather than creating new ones

The security issues dominating MCP headlines are textbook examples of familiar vulnerabilities repackaged for an AI-mediated world. Take tool poisoning attacks, where malicious instructions embedded in tool descriptions manipulate AI behaviour.

The Supabase MCP blog vulnerability exemplifies this pattern. Researchers discovered that malicious content in blog posts could manipulate the AI assistant through indirect prompt injection, leading to data exfiltration and unauthorised actions. This attack vector was already well-documented in Kai Greshake's February 2023 research on "Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection."

The command injection vulnerabilities found in over 43% of MCP servers headlines represent another telling example. These aren't MCP-specific flaws, they're the same os.system() vulnerabilities that have existed in software for decades. MCP simply provides a new pathway for attackers to trigger these familiar bugs through AI-mediated parameter construction.

Even MCP's most sophisticated attacks rely on established techniques. The "rug pull" attacks where trusted tools dynamically change behavior? That's a supply chain attack dressed up in new clothes. Cross-server tool shadowing? A confused deputy attack where the AI agent becomes the unwitting accomplice.

The fundamental architectural problem remains unsolved

The security crisis isn't about MCP's implementation, it's about the impossible task of creating secure tool calling in systems where all input is processed equally. Current LLM architectures cannot maintain separate security contexts within the same input stream. System prompts, user input, tool descriptions, and tool results get combined into a single token stream.

This architectural limitation makes traditional security approaches ineffective. Unlike SQL injection where prepared statements provide a clear solution, prompt injection exploits the very mechanism that makes LLMs useful: their ability to process and respond to natural language instructions. As Simon Willison noted in his analysis of the "lethal trifecta," any system combining access to private data, exposure to untrusted content, and external communication ability is fundamentally vulnerable.

Industry responses have largely involved throwing more AI at the problem, with mixed results. OpenAI's instruction hierarchy approach showed only 63% improvement in defence against system prompt extraction, and researchers continue to demonstrate bypasses. Google's CaMeL framework represents the most promising architectural innovation, using separate LLMs for trusted and untrusted content, but even this approach reduces functionality while imposing significant computational overhead.

The reality is that these mitigation attempts are trying to solve an unsolvable problem within current LLM architectures. The models themselves cannot reliably distinguish between legitimate instructions and malicious ones when both are presented in natural language.

Why developers find this particularly terrifying

The MCP security crisis represents a fundamental paradigm shift that challenges everything developers know about building secure systems. Traditional software security operates on predictable, deterministic principles: the same input produces the same output, vulnerabilities can be reproduced consistently, and static analysis tools can identify most issues.

MCP introduces a non-deterministic security model where the same potential exploit might succeed or fail based on conversation history, model state, and even slight variations in how the AI interprets context. Research shows that even with temperature=0, LLMs exhibit accuracy variations up to 15% across naturally occurring runs, making vulnerability reproduction unreliable.

This resembles social engineering more than technical exploitation. As IBM's analysis notes, prompt injections "use plain language to trick LLMs into doing things that they otherwise wouldn't." For developers accustomed to clearly defined code-based vulnerabilities, this feels like an entirely foreign threat model.

The psychological impact is amplified by the scale of potential consequences. A single malicious MCP server can compromise all connected clients through cross-server tool shadowing. Traditional security boundaries become meaningless when an AI agent can be convinced to execute arbitrary actions through persuasive language rather than technical exploitation.

MCP's real crime: scaling the attack surface

MCP's primary security contribution isn't introducing new vulnerabilities, it's democratising and scaling existing ones. The protocol's standardised approach to tool calling has created an ecosystem where a single compromise can have cascading effects across multiple systems.

The numbers tell the story: Trend Micro's discovery of SQL injection in Anthropic's reference SQLite MCP server impacted over 5,000 downstream implementations before the repository was archived. This represents a supply chain attack scaled through developer convenience and ecosystem network effects.

The CVE-2025-49596 remote code execution vulnerability in MCP Inspector demonstrates this scaling effect. What started as a simple lack of authentication combined with a CSRF vulnerability became a critical security issue because of MCP's widespread adoption. Malicious websites could execute arbitrary commands on any developer machine running the tool on its default port.

The tool poisoning attacks documented by Invariant Labs show how MCP's architecture amplifies the tool calling issue. Malicious instructions embedded in tool descriptions affect every AI agent that loads the compromised server. This isn't a novel attack vector, it's prompt injection scaled through protocol standardisation.

The industry's response reveals the deeper problem

The emergence of security scanning tools like MCP-Scan, while valuable, treats the symptoms rather than the disease. These tools can detect known attack patterns but can't address the fundamental impossibility of secure tool calling in current LLM architectures.

Even the most sophisticated industry responses acknowledge their limitations. Anthropic's multi-layered defense approach explicitly warns that vulnerabilities like prompt injection "may persist across frontier AI systems." Their Computer Use feature specifically advises users to "take precautions to isolate Claude from sensitive data"an admission that the problem can't be solved through technical means alone.

The research community's focus on defense-in-depth strategies and assuming compromise represents a tacit acknowledgment that prevention may be impossible. When Simon Willison advocates for treating these systems as "inherently vulnerable to input manipulation," he's describing the fundamental architectural limitation that makes MCP security challenging.

The path forward requires honesty about trade-offs

The MCP security crisis serves as a crucial wake-up call, it forces the industry to confront the inherent security limitations of current LLM architectures. The protocol's rapid adoption and developer-friendly design have simply made these limitations more visible and impactful.

Rather than viewing MCP as a security failure, we should recognise it as an acceleration of an existing problem that demands architectural innovation.

The developer community's anxiety about MCP security reflects a deeper challenge: transitioning from predictable, deterministic security models to probabilistic, context-dependent ones. This transition requires new security frameworks that account for the dynamic, non-deterministic nature of AI-mediated interactions.

Conclusion: Focus on the real enemy

The headline-grabbing MCP vulnerabilities are symptoms of a deeper architectural problem that predates the protocol by over a year. Tool calling security issues have existed since OpenAI's function calling API launched in June 2023, manifesting across every major LLM provider's implementations.

MCP's contribution isn't creating new attack vectors, it's scaling existing ones through standardisation and ecosystem adoption. The tool poisoning attacks, content injection vulnerabilities, and command injection flaws making headlines are repackaged versions of familiar security problems, made more impactful by MCP's rapid adoption.

The real security crisis isn't MCP-specific; it's the fundamental impossibility of secure tool calling in current LLM architectures. Until the industry develops new approaches that can reliably distinguish between trusted instructions and untrusted input, we'll continue seeing the same vulnerabilities repackaged in new protocols.

The focus should shift from treating MCP as a security pariah to recognising it as a catalyst for necessary architectural innovation. The protocol's security challenges force us to confront the limitations of current LLM systems and develop new security models appropriate for the age of AI agents.

Only by acknowledging that the problem runs deeper than any single protocol can we begin building the secure AI systems that developers and users deserve.

Remote MCP

Discussion about this post