"Prompt Hijacking" Threatens Businesses, Exploiting AI's Core Protocol Flaw
Prompt hijacking exploits the Model Context Protocol, turning AI agents into vectors for unprecedented corporate data theft.
October 22, 2025

A new and potent security threat is quietly emerging as businesses rush to integrate advanced artificial intelligence into their core operations. Security experts are sounding the alarm on "prompt hijacking," a class of vulnerability that targets the very protocols designed to make AI more powerful and useful. Researchers, including a prominent team at JFrog, have identified critical weaknesses in the Model Context Protocol (MCP), an open standard championed by AI firm Anthropic to connect large language models (LLMs) with external data and tools. While MCP is designed to give AI agents the context needed to perform complex, real-world tasks, this connectivity creates a significant new attack surface, turning helpful AI assistants into potential vectors for data theft and system compromise.
The Model Context Protocol acts as a universal translator, allowing AI models like Anthropic's Claude to seamlessly access and use a wide array of external resources, from a user's email inbox to internal company databases and public APIs like GitHub.[1][2] This is achieved through a client-server architecture where the AI application (the client) communicates with various MCP servers, each exposing specific tools and data.[3] The goal is to move beyond the limitations of static training data and create AI agents that can interact with the world dynamically. However, this open-ended design introduces a fundamental security flaw. The AI model must interpret data from these external sources to understand which tools to use, and attackers have discovered that they can embed malicious instructions within this data. This technique, known as indirect prompt injection, allows an adversary to manipulate an AI's behavior without the user's knowledge, effectively hijacking the session to perform unauthorized actions.[4][5]
The methods for executing prompt hijacking attacks against MCP are varied and sophisticated. One of the most insidious techniques is "tool poisoning," where malicious instructions are hidden within the descriptions of the tools an MCP server provides.[1][3][4] The AI model reads these descriptions to understand a tool's function, and can be compromised before the user ever decides to use the tool. For instance, a benign-looking tool for checking stock prices could have a description that secretly instructs the AI to exfiltrate any financial data it encounters. Another significant vulnerability was discovered by security researchers at JFrog, who identified a flaw in a popular MCP implementation where session IDs were predictable and could be reused (CVE-2025-6515).[6] This allowed an attacker to hijack an active user session and inject malicious responses, for example, tricking an AI assistant into recommending a malicious software package to a developer instead of a legitimate one.[6][7] Other proven attack vectors include creating malicious MCP servers that impersonate or "shadow" legitimate ones to intercept data, and exploiting insecure credential storage practices where API keys for third-party services are left exposed.[8][2][9] In one potent example, researchers demonstrated how a malicious issue created in a public GitHub repository could be used to hijack an AI agent, instructing it to access private repositories and leak sensitive corporate data.[10]
For business leaders eager to leverage AI to analyze proprietary data and automate workflows, these vulnerabilities represent a critical barrier to safe adoption. The very act of connecting an AI to internal databases, source code repositories, or customer relationship management systems via MCP creates a potential conduit for disaster.[9][11] A successful prompt hijacking attack could lead to the exposure of trade secrets, employee records, financial data, and sensitive customer information.[12][10] The threat is magnified because these attacks are not theoretical; researchers have demonstrated repeatable proofs of concept that exploit these weaknesses.[6][10] Compromising a single MCP server could grant an attacker the "keys to the kingdom," providing persistent access to all the services connected to it, from email to cloud storage.[12][11] This fundamentally undermines the trust required for enterprises to deploy AI agents in sensitive environments, turning a promising productivity tool into a significant liability.
Addressing the threat of MCP prompt hijacking requires a multi-layered, defense-in-depth security strategy, as no single solution is foolproof.[13][14] Security experts and the protocol's own documentation emphasize the need for robust input validation and output sanitization to filter malicious instructions.[8][15] Implementing a zero-trust architecture, where every request is authenticated and authorized, is paramount.[16] This includes applying the principle of least privilege, ensuring AI agents only have the minimum necessary permissions to perform a task, and sandboxing MCP servers to restrict their access to the wider system.[13][7] For businesses, this means meticulously vetting any third-party MCP servers, preferring official and well-maintained tools, and implementing continuous monitoring and logging to detect anomalous behavior.[1][12] Ultimately, the emergence of prompt hijacking highlights a crucial lesson for the AI industry: security cannot be an afterthought. As protocols like MCP become foundational infrastructure for the next generation of AI, security must be built into the core design to ensure these powerful tools can be deployed with confidence.
Sources
[1]
[3]
[4]
[5]
[9]
[10]
[11]
[12]
[14]
[16]