Anthropic Proposes Mandatory AI Safety Transparency for Top Developers

Anthropic proposes mandatory safety disclosures for powerful AI labs, aiming to formalize accountability amidst catastrophic risk concerns.

July 10, 2025

Anthropic Proposes Mandatory AI Safety Transparency for Top Developers
A significant proposal for a transparency framework aimed at the developers of the most powerful artificial intelligence systems has been put forward by AI safety and research company Anthropic.[1][2] This move comes amidst a rapidly advancing AI landscape and growing concerns about the potential for catastrophic risks associated with frontier models, the most advanced AI systems.[2][3] The proposed framework seeks to establish a new level of accountability for major AI labs by mandating public disclosures about their safety practices, a measure intended to provide crucial insights to policymakers and the public.[1][2]
The core of Anthropic's proposal is a targeted approach, focusing exclusively on large-scale AI developers that meet specific financial thresholds.[4] The framework would apply to companies with annual revenues around $100 million or those with research and development expenditures of approximately $1 billion annually.[4] This deliberate exclusion of smaller companies and startups is designed to foster innovation and avoid placing undue regulatory burdens on entities developing models with a lower risk of causing catastrophic harm.[2][5] For the companies that fall within its scope, the framework would require the public disclosure of a "Secure Development Framework" (SDF).[4] This document would detail the company's procedures for assessing and mitigating catastrophic risks, including threats related to chemical, biological, radiological, and nuclear (CBRN) weapons, as well as dangers from AI systems acting autonomously in ways not intended by their creators.[4][6]
Under the proposed framework, covered companies would be obligated to publish their SDFs on publicly accessible websites.[4] They would also be required to release "system cards" when deploying a new model, which would summarize the results of safety testing and the mitigation measures that have been implemented.[1][5] To ensure accountability, the framework suggests making it illegal for companies to make false statements about their safety practices and includes provisions for whistleblower protections to encourage employees to report safety concerns without fear of retaliation.[1][4] Enforcement could be carried out by state attorneys general, who would be authorized to seek civil penalties for significant breaches.[4] This "truth-telling" approach is designed to be flexible, avoiding rigid, government-imposed technical standards that could quickly become obsolete as AI technology evolves at a rapid pace.[4][2]
The proposal from Anthropic has been met with a range of reactions from across the technology and policy sectors. The framework aims to formalize and legally codify many of the voluntary safety practices that leading AI labs, including Anthropic, OpenAI, Google DeepMind, and Microsoft, have already adopted.[4] Proponents argue that turning these voluntary commitments into legal requirements would prevent companies from abandoning safety measures as competitive pressures in the AI field intensify.[4] While the framework explicitly exempts startups from direct regulation, some analysts suggest it could create indirect pressure, as enterprise customers may increasingly demand that their AI vendors demonstrate robust safety and compliance programs, potentially favoring larger, regulated companies.[4][7] The proposal has also been viewed as a strategic move by Anthropic to shape future AI regulations in a way that is favorable to established industry players.[7] Critics point out that the framework's emphasis on flexibility and self-certification, while seemingly reasonable, could allow companies to avoid meaningful oversight.[7] The broad exceptions for redacting information deemed a "trade secret" or "confidential business information" have also drawn scrutiny, with concerns that such loopholes could be used to withhold crucial technical details from public view.[7]
Anthropic's proposal is an outgrowth of its own internal safety policies, most notably its Responsible Scaling Policy (RSP).[2][3] The RSP introduced a system of AI Safety Levels (ASLs), modeled after the biosafety levels used for handling dangerous biological materials.[3] This tiered system requires increasingly stringent safety and security measures as a model's capabilities and potential for harm increase.[3][8] For instance, models classified as ASL-3, which are deemed capable of substantially increasing the risk of catastrophic misuse, are subject to stricter testing and security protocols.[3][9] Anthropic has already applied these higher standards to its own models, such as activating ASL-3 protections for its Claude Opus 4 model, which involves enhanced cybersecurity and measures to prevent the model from being used to develop weapons.[10][11] The transparency framework can be seen as an effort to export these principles of proportional, risk-based safety governance to the broader AI industry.[8] The proposal is presented as an interim step, a practical measure to increase public visibility into safety practices while more comprehensive, global standards and evaluation methods are developed.[2] It arrives at a time of significant governmental interest in AI, with the White House issuing executive orders on AI safety and various legislative proposals under consideration.[7][12] By proactively offering a detailed policy blueprint, Anthropic is positioning itself to influence the ongoing debate and shape the future of AI regulation.[4][7]

Sources
Share this article