Hidden Prompts Hijack AI Assistant, Control Home, Leak Data
Hidden commands in your everyday data can hijack AI, leaking secrets and controlling your devices.
August 7, 2025

A groundbreaking study has revealed a significant vulnerability at the heart of modern AI assistants, demonstrating how Google's Gemini can be manipulated to leak sensitive data, execute malicious commands, and even control physical devices through cleverly hidden instructions.[1][2] Researchers from Israel's Ben-Gurion University, Tel Aviv University, the Technion, and SafeBreach Labs discovered that by embedding malicious prompts in everyday data sources like a Google Calendar invite or an email, an attacker could secretly hijack the AI's functions.[3][4][5] This research highlights a new class of threats, dubbed "Targeted Promptware Attacks," that exploit the fundamental way large language models (LLMs) like Gemini process information, posing a serious challenge to the security and trustworthiness of an increasingly AI-integrated world.[3][2]
The core of the vulnerability lies in a technique known as indirect prompt injection.[6] Unlike direct attacks where a user knowingly inputs a malicious command, this method involves poisoning the data an AI assistant is asked to analyze.[7] For example, an attacker can send a target a calendar invitation with a hidden command embedded in the event title.[3][4] When the user later asks Gemini a simple question like, "What's on my calendar today?" the AI retrieves and processes the event, including the hidden malicious instruction.[3][8] Because LLMs are designed to follow instructions within the text they process, they often cannot distinguish between the user's legitimate request and the attacker's embedded command.[7][6] This fundamental ambiguity is what makes the attack so potent and difficult to defend against.[9]
The potential consequences of such an attack are vast and alarming. The researchers demonstrated a shocking range of malicious activities that could be triggered through this method.[3] In one scenario, a hidden prompt could instruct Gemini to search through a user's emails for sensitive information, such as passwords or financial details, and then exfiltrate that data by embedding it in a URL that is sent to an attacker-controlled server.[3][5] The attack also proved capable of manipulating the user's digital environment, such as deleting calendar events or sending spam and phishing emails from the user's account.[3] Perhaps most disturbingly, the researchers showed it was possible to achieve "on-device lateral movement," where the attack moves beyond the AI application to control other functions, including launching a Zoom call to video stream the victim or even controlling connected smart home appliances like lights and boilers.[3][2][5] This demonstrates a troubling leap from digital privacy violations to potential real-world physical impact.[4][8]
This discovery places a spotlight on an inherent security flaw in the current generation of generative AI systems.[7] As AI assistants become more powerful and integrated into various applications—from summarizing documents to controlling smart homes—their ability to process and act on unstructured, external data becomes a significant security risk.[10][11] Any data source, be it an email, a shared document, or a visited webpage, can potentially become a Trojan horse carrying malicious instructions.[6][9] This represents a paradigm shift from traditional cybersecurity threats. The very nature of LLMs, which excel at understanding and executing natural language commands, makes them susceptible to this form of manipulation.[7] Experts in the field note that this type of attack is particularly insidious because it exploits the AI's trust in the data it processes, turning a core function into a critical vulnerability.[6][9] The issue is compounded as these AI models are increasingly integrated into complex enterprise systems and software development workflows, creating risks for supply chain attacks where malicious code could be unknowingly injected into projects.[12][6]
In response to these findings, which the researchers responsibly disclosed to Google in February 2025, the tech giant has acknowledged the seriousness of the threat.[3][4] Google has stated it has not observed this technique being used in real-world attacks but is implementing a multi-layered defense strategy to make its Gemini models more resilient.[3][13][14] This strategy includes "model hardening," a process of fine-tuning Gemini on vast datasets of simulated attacks to teach it to better distinguish between user requests and malicious embedded instructions.[15][16] Google is also deploying system-level safeguards, such as enhanced detection of suspicious URLs and providing users with security notifications when a potential prompt injection is mitigated.[13][17] While these measures have reportedly improved Gemini's defenses, experts and Google itself concede that no model can be completely immune.[15] The broader AI industry faces a significant challenge, with similar vulnerabilities potentially affecting other major AI assistants like Microsoft's Copilot and Salesforce's Einstein.[10][18] The research serves as a critical warning that as AI becomes more autonomous and integrated into the physical world, the need for robust, foundational security measures and a deeper understanding of these novel threats is paramount to ensuring user safety and trust.[11][8]
Sources
[4]
[7]
[10]
[11]
[13]
[14]
[15]
[16]
[17]
[18]