Leaked 60,000-character prompt showcases unprecedented AI behavior control.
The alleged 60,000-character "Claude 4" prompt leak reveals how developers are achieving unprecedented control over AI personality and safety.
May 25, 2025

A recently surfaced internal system prompt, purportedly for an advanced AI model referred to in online discussions as "Claude 4" from Anthropic, has sent ripples through the artificial intelligence sector. The prompt, reportedly made available on GitHub by a user known as "Pliny the Liberator," is said to be over 60,000 characters long.[1][2][3] This extensive set of instructions, which guides the AI's behavior, tone, and operational rules even before a user inputs a query, offers a rare and detailed glimpse into the sophisticated control mechanisms of leading large language models (LLMs).[1][2] The sheer volume of this internal directive highlights the intricate engineering involved in shaping AI personalities and ensuring they operate within desired parameters.
System prompts are a foundational element in modern LLM architecture, acting as a persistent, high-level guide that shapes every interaction the AI undertakes.[1][4] Typically hidden from end-users, these prompts are crucial for aligning the AI's responses with developer intentions, encompassing safety protocols, personality traits, and specific functionalities.[1][4] The leaked document attributed to "Claude 4" reportedly contains around 24,000 tokens (which can equate to roughly 60,000 characters or 22,600 words), detailing a wide array of instructions.[1] These include directives on maintaining a concise and courteous style, adhering to safety and compliance by blocking extremist content or copyrighted material, and even limiting direct quotes from sources to under 20 words.[1][3] The prompt also outlines when the model should perform web searches and mandates citations for external facts.[1] This development is particularly noteworthy because while LLMs can sometimes struggle with relatively short user-provided instructions, the ability to adhere to such a lengthy and complex internal "operating manual" suggests a significant capacity for nuanced instruction-following at a foundational level.[1][2]
The implications of an AI model being able to effectively process and follow such an extensive system prompt are manifold. It signifies a leap in the controllability and steerability of AI.[4] With a vast set of pre-defined instructions, developers can, in theory, exert much finer-grained control over the AI's outputs, reducing undesirable behaviors like generating harmful content or going off-topic. The detailed prompt for "Claude 4" reportedly includes instructions on how to handle sensitive topics, avoid "annoying preaching," and even specific directives against copyright infringement, such as not reproducing Disney content.[3] It also guides the AI on its persona, encouraging it to be helpful, wise, and even to lead conversations or offer its own observations, moving beyond a purely reactive role.[5] Furthermore, the prompt is said to contain instructions for the AI to inform users about its knowledge cutoff date, which is stated as January 2025 in the prompt, despite some Anthropic documentation reportedly listing a later training data cutoff.[6][3][7] This level of detailed instruction allows for the crafting of AI systems that are more predictable, reliable, and aligned with specific ethical guidelines or brand voices.
This leaked system prompt, assuming its authenticity and direct linkage to a "Claude 4" model (Anthropic's announced models as of mid-2025 include the Claude 3 family and the newer Claude 3.5 Sonnet, with "Claude 4" models like Opus 4 and Sonnet 4 also being discussed and reportedly available on platforms like AWS Bedrock[8][9][10][11]), offers valuable insights for the broader AI industry. The ability to follow a 60,000-character directive suggests advancements in model architecture and training methodologies that enable LLMs to internalize and act upon a much larger context of rules and guidelines.[12] It underscores the evolving art and science of prompt engineering, which is becoming increasingly sophisticated. Crafting such extensive prompts is a complex task, requiring a deep understanding of the model's capabilities and potential failure points. The content of the "Claude 4" prompt, with its detailed instructions on tool use, safety, style, and even how to respond if it's unable to fulfill a request or if a user is unsatisfied, reflects this complexity.[6] For instance, it reportedly includes guidance for when the AI should admit it doesn't know something and direct users to Anthropic's official support page.[6][7] The leak also includes details about how the AI should handle requests for counting words or characters, by thinking step-by-step, a feature noted as having been present in earlier Claude 3.7 prompts.[6][7]
The public availability of such a detailed system prompt, facilitated by individuals like "Pliny the Liberator" who specialize in revealing these hidden instructions[13][14][15], can fuel both innovation and scrutiny. Other researchers and developers can learn from the techniques Anthropic purportedly uses to control its models, potentially leading to advancements in AI safety and alignment across the field.[16] It also opens the door for more informed discussions about the transparency and an AI's "inner workings." However, it also raises questions about the security of these internal instructions and the potential for misuse if vulnerabilities in how the model processes these prompts are discovered.[17] The detailed nature of the prompt, including how to handle "red flags" or avoid sycophantic responses, indicates a continuous effort by developers to refine AI behavior and address previously observed issues.[3][7] The very existence of such detailed instructions within system prompts may also be interpreted as a catalog of behaviors the model might have exhibited before being explicitly told not to.[6][7]
In conclusion, the reported leak of a 60,000-character system prompt for a model referred to as "Claude 4" represents a significant point of discussion in the AI landscape. It highlights the advanced capabilities of modern LLMs to adhere to complex and lengthy internal directives, paving the way for more controllable, nuanced, and specialized AI applications. This development underscores the critical role of sophisticated prompt engineering in shaping AI behavior and the ongoing efforts to ensure these powerful tools are both highly capable and aligned with human intentions and safety considerations. The insights gleaned from such leaks contribute to a broader understanding of how these complex systems are designed and governed, pushing the industry towards greater transparency and more robust AI development practices.
Research Queries Used
Claude 4 system prompt leak 60000 characters Pliny the Liberator GitHub
Anthropic Claude 4 release
capabilities of large language models with long system prompts
Pliny the Liberator X Claude system prompt
GitHub leaked Claude system prompt details
Sources
[3]
[7]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]