arXiv imposes mandatory one-year bans to purge unverified AI-generated manuscripts from its repository

To combat a surge of AI-generated manuscripts, arXiv implements mandatory bans to ensure human accountability in scientific research.

May 15, 2026

arXiv imposes mandatory one-year bans to purge unverified AI-generated manuscripts from its repository
The scientific community is currently navigating a period of significant friction as its most vital platforms for knowledge exchange grapple with a surge of automated content.[1][2][3][4][5][6] For decades, arXiv has served as the preeminent global repository for pre-publication research, offering a space where breakthroughs in physics, mathematics, and computer science could be shared and scrutinized before undergoing the formal, and often lengthy, peer-review process. However, the foundational trust that allows such an open system to function is being tested by an unprecedented influx of low-quality, AI-generated manuscripts. In response, arXiv has initiated a rigorous crackdown on unchecked generative content, signaling a major shift in how the platform balances its commitment to open access with the urgent need to maintain the integrity of the scientific record.
At the heart of this new enforcement regime is a zero-tolerance policy for papers that show undeniable signs of being produced by large language models without human oversight.[2][7] Moderators have clarified that while generative AI tools are not strictly forbidden as aids for drafting or translation, authors remain personally and professionally responsible for every word and figure in their submissions.[7][8][2] The new penalties are severe: researchers caught submitting work containing incontrovertible evidence of unverified AI output face a mandatory one-year ban from the platform.[2][7] Furthermore, once this suspension is lifted, any subsequent submissions from those authors must first be accepted by a reputable, peer-reviewed journal or conference before they are eligible for hosting on the server.[2] This move effectively transforms the platform from a first-stop destination for new ideas into a guarded gate for those who have previously violated the community’s standards of accuracy.
The criteria for what constitutes incontrovertible evidence are specific and reflect the unique "fingerprints" left by today’s large language models. Moderators are increasingly flagging papers that contain "hallucinated" references—citations to academic papers that do not exist but sound plausible to a casual reader.[2] Other red flags include the presence of meta-comments from the AI itself, such as residual instructions like "here is a summary of the provided data" or "please insert experimental results here."[7] In some cases, papers have been submitted with illustrative data sets generated by the model rather than real-world findings, with the AI even including notes to the author to replace the placeholder numbers with actual experimental data.[2][7] These artifacts are viewed by the repository’s leadership as proof that the authors did not perform even a cursory review of their own work, undermining the fundamental premise of scientific authorship.
This policy shift is particularly pronounced in the field of computer science, which has become a primary battleground for what researchers are calling "AI slop."[6][5] This category of the repository has recently seen a massive spike in submissions of review articles and position papers—formats that synthesize existing literature rather than presenting new experimental data. Because these papers require less technical validation than original research, they have become an easy target for "paper mills" and low-effort authors using generative tools to churn out manuscripts at an industrial scale. To combat this, the computer science category now mandates that all review and position papers must have proof of prior acceptance at a refereed venue.[5][9][1][6][7][10] Moderators noted that many of these automated submissions were little more than annotated bibliographies, offering no novel insights or substantial discussion of open research problems, and were effectively drowning out legitimate scientific discourse.
The scale of the problem is underscored by recent data suggesting that the contamination of the scientific record is already well underway. Independent audits of global research databases have estimated that tens of thousands of completely fabricated references entered the scholarly literature in the last year alone.[3] These fake citations do more than just mislead readers; they corrupt the metrics used to measure scientific influence and can lead researchers down dead-end paths based on non-existent discoveries. This phenomenon is often referred to as a "dead internet" scenario for science, where the proliferation of synthetic content creates a feedback loop of misinformation. If scientists begin to cite AI-generated papers, and those citations are then used to train the next generation of AI models, the resulting "model collapse" could degrade the quality of human and machine intelligence alike.
Beyond the immediate administrative changes, these new rules reflect a broader philosophical debate within the AI industry and academia. The rise of sophisticated writing assistants has blurred the lines between human and machine authorship, forcing institutions to define where assistance ends and fabrication begins. For the AI industry, the crackdown at arXiv serves as a reality check regarding the limitations of current technology. While large language models are capable of producing fluent prose, they lack the capacity for truth-tracking or understanding the causal logic required for scientific inquiry. The repository’s stance reinforces the idea that AI should be viewed as a sophisticated instrument—not unlike a microscope or a statistical software package—that requires a human expert to interpret and verify its outputs.
The implications for the research community are profound, especially for early-career scientists and labs in developing regions who rely on the platform for visibility.[2] A one-year ban is not merely a bureaucratic hurdle; it can derail a researcher’s career by delaying the dissemination of critical work, impacting funding opportunities, and severing ties with potential collaborators.[2] By raising the stakes for AI-related errors, the repository is effectively forcing a culture of slower, more deliberate verification in an era of rapid-fire digital publishing. This "speed bump" strategy is designed to deter the use of AI as a shortcut to high publication counts, reminding authors that the prestige of the scientific record is inseparable from the personal accountability of the people who contribute to it.
Ultimately, the repository’s transition toward more aggressive moderation highlights a necessary evolution in scientific communication. As the tools for generating content become more powerful, the systems for vetting that content must become equally sophisticated, even if it means sacrificing some of the openness that originally defined the platform. The crackdown is a clear signal that the scientific community will not allow the ease of automation to supersede the rigors of the scientific method. By holding authors to a standard of absolute responsibility, the repository aims to ensure that the digital archives of human knowledge remain a reliable foundation for future discovery rather than a repository for high-tech noise. The coming years will likely see a continued arms race between those attempting to automate the publication process and the moderators tasked with preserving the value of the human-driven research enterprise.

Sources
Share this article