U.S. government buries 139 AI safety bypass findings due to politics.

Political pressure buried a landmark U.S. study identifying 139 AI safety bypasses, leaving regulators blind to critical risks.

August 7, 2025

U.S. government buries 139 AI safety bypass findings due to politics.
A landmark U.S. government study that identified 139 new ways to bypass the safety features of leading artificial intelligence systems has been effectively buried, reportedly due to political pressure. The findings, stemming from a "red-teaming" exercise organized by the National Institute of Standards and Technology (NIST), have never been publicly released, creating what some experts are calling a dangerous information vacuum. This suppression of crucial vulnerability data comes at a paradoxical moment, as new federal directives are quietly mandating the very type of adversarial testing that the unpublished report details, raising serious questions about the transparency and political motivations influencing AI safety and regulation in the United States.
The suppressed study was the result of a two-day event in October of the previous year, held at a security conference in Arlington, Virginia.[1] Approximately 40 AI researchers participated in the exercise, which was part of NIST's ARIA program and conducted in collaboration with the AI safety company Humane Intelligence.[1] The objective was to probe advanced AI systems for weaknesses, targeting prominent models such as Meta's open-source Llama large language model, Synthesia's avatar generator, and a security system from Robust Intelligence, which has since been acquired by Cisco.[1] The goal was to evaluate how these systems stood up against misuse, including their potential to spread disinformation, leak private data, or foster unhealthy emotional dependencies in users.[1] The results were stark: participants discovered 139 novel methods to circumvent the built-in safeguards of these systems.[1] For instance, researchers found that by prompting Meta's Llama model in languages other than English, such as Russian, Marathi, or Gujarati, they could coax it into providing information on how to join terrorist organizations.[1][2] Other systems were manipulated into disclosing personal data and offering instructions for carrying out cyberattacks.[1]
Despite the significance of these findings, the full report has remained under wraps. Sources familiar with the matter claim the document was suppressed for political reasons, specifically over fears it would conflict with the policy direction of the new administration, which has signaled a rollback of AI safety initiatives focused on issues like "misinformation" and "bias."[2] This move is seen by some as part of a broader shift away from the previous administration's approach to AI regulation. The Biden administration had issued a comprehensive executive order in October 2023 aimed at creating safeguards for AI development, which included mandating safety tests and establishing the U.S. AI Safety Institute within NIST.[3][4][5] However, this executive order was rescinded in the early days of the current administration, which has emphasized a deregulatory agenda to foster innovation, arguing that stringent safety testing could stifle the industry and put American firms at a competitive disadvantage.[4][5]
The decision to shelve the NIST report creates a troubling contradiction at the heart of the government's approach to AI. While the specific findings of the red-teaming event are being withheld, a new American AI Action Plan ironically calls for the exact kind of hackathon-style testing that produced the suppressed data.[2][6] This creates a situation where the government is publicly advocating for stress-testing AI models while simultaneously preventing the dissemination of critical knowledge gained from such an exercise. The new administration's policy also includes executive orders aimed at preventing what it terms "woke AI," directing NIST to revise its AI Risk Management Framework to eliminate references to misinformation, diversity, equity, and inclusion.[6][7] This ideological directive has led to concerns that politically motivated changes could degrade the value and precision of federal AI risk assessments.[6]
The implications of suppressing this vulnerability research are far-reaching. Without access to a detailed, government-backed report on the flaws of major AI systems, regulators, independent researchers, and the public are left "flying blind."[2] The lack of shared, transparent knowledge about AI vulnerabilities hampers the ability to create effective regulations and safeguards.[2] It leaves a critical information gap that could be exploited by malicious actors, even as AI systems become more integrated into critical infrastructure and daily life.[8] The situation highlights a growing tension between national security, corporate interests, and public safety in the rapidly advancing field of artificial intelligence. While the government officially champions the need for secure and trustworthy AI, the alleged political suppression of a study that directly addresses these issues suggests that other priorities may be taking precedence over transparently addressing the technology's known risks.

Sources
Share this article