FDA's AI for Drug Approvals Fabricates Research, Alarms Staff

Meant to speed drug approvals, FDA's new AI, Elsa, is fabricating studies, sparking alarm about its reliability and public health risks.

July 24, 2025

FDA's AI for Drug Approvals Fabricates Research, Alarms Staff
A new generative artificial intelligence system named Elsa, rolled out by the U.S. Food and Drug Administration to accelerate drug and medical device approvals, is reportedly fabricating non-existent studies and misrepresenting research, according to agency employees.[1][2][3][4] The AI tool, launched in June 2025 with the goal of streamlining the review process, has instead sparked significant concerns among staff and outside experts about its reliability and the potential risks of deploying unproven technology in such a high-stakes environment.[1][5][4] These revelations raise critical questions about the agency's rapid move into the AI era and the safeguards necessary for its use in critical public health decisions.[2][6]
The internal AI assistant, Elsa, was publicly lauded by FDA leadership as a major step towards a more efficient regulatory process, with officials claiming it could reduce tasks that once took days to mere minutes.[2][5][7] Developed to assist with a range of tasks from reading and summarizing documents to writing code, Elsa was intended to modernize the agency's paper-based review system.[2][8] FDA Commissioner Marty Makary hailed the rollout as the "dawn of the AI era at the FDA," emphasizing the need for the agency to modernize and eliminate inefficiencies.[2][8] The system, built within a secure government cloud environment, was designed to handle internal documents without training on sensitive industry-submitted data, a key security consideration.[5][8] The agency even suggested Elsa could be used for complex tasks like identifying adverse events and high-priority inspection targets.[5][8]
Despite the enthusiastic public endorsements, the reality of Elsa's performance has been met with sharp criticism from the very employees it was designed to help.[2][5] Multiple FDA staffers have reported that the AI tool is unreliable for critical work, frequently "hallucinating" or inventing research and academic citations.[1][2][3] One reviewer noted that the tool "hallucinates confidently," making anything that cannot be double-checked untrustworthy.[2][3][4] This unreliability has reportedly created more work for some reviewers, who now feel the need for "heightened vigilance" when using the system.[4] Further compounding the issue are Elsa's technical limitations, including its inability to access many relevant documents crucial for evaluating the safety and effectiveness of new drugs, such as confidential industry submissions.[1][4][9] Staff members who tested the system with basic questions reported receiving incorrect answers.[4][10] Consequently, some employees have indicated the tool is only useful for basic administrative tasks like summarizing meetings or drafting emails.[2]
In response to these concerns, FDA leadership has appeared to downplay the severity of the issues. Commissioner Makary stated he had not heard the specific concerns about fabricated studies and emphasized that the use of Elsa is currently voluntary for agency staff.[3][4][11] The FDA's head of AI, Jeremy Walsh, acknowledged that Elsa, like other large language models, "could potentially hallucinate."[4][9] Officials have also pointed to safeguards, such as forcing citations when the tool is used to analyze document libraries, to mitigate the risk of fabricated information.[12] They maintain that the agency's multi-layered review process, involving numerous experts, ensures that no flawed, AI-generated information would make it into a final decision.[12] The FDA also clarified that Elsa is not being used to make final regulatory decisions on drug approvals.[13] The agency has stated its commitment to an "agile, risk-based framework" for AI and has issued draft guidance on the use of AI in regulatory decision-making, encouraging early collaboration with sponsors developing AI models.[14][15][16]
The controversy surrounding Elsa arrives at a time when the use of AI in medicine is operating with minimal federal oversight.[1] The situation highlights the broader challenges and risks associated with integrating generative AI into critical scientific and regulatory domains.[17][6] The phenomenon of AI "hallucination" is a well-documented problem across the tech industry, but the stakes are significantly higher when it involves matters of public health and safety.[11][6] The issues with Elsa underscore the necessity for robust validation, transparency, and a clear understanding of the technology's limitations before it is integrated into workflows that could impact patient well-being.[15][6][18] As the FDA and other regulatory bodies navigate the promise of AI, the experience with Elsa serves as a critical case study on the importance of balancing innovation with rigorous oversight and ensuring that the "human in the loop" remains a meaningful safeguard against technological fallibility.[5][6]

Sources
Share this article