WMDP Benchmark favicon

WMDP Benchmark

Free
WMDP Benchmark screenshot
Click to visit website
Feature this AI

About

The Weapons of Mass Destruction Proxy (WMDP) benchmark is a specialized evaluation framework designed to measure and mitigate hazardous capabilities within large language models (LLMs). Developed by the WMDP Team in collaboration with the Center for AI Safety, the project addresses concerns that advanced AI systems could be misused to facilitate biological, chemical, or cyber attacks. The benchmark consists of a dataset containing 3,668 multiple-choice questions across these three critical security domains. By providing a public and standardized metric, it allows the research community to assess how much dangerous knowledge a model possesses. The core methodology of WMDP involves the use of proxy knowledge. To avoid the risk of disseminating sensitive or export-controlled information, the expert authors focused on precursors, neighbors, and components of hazardous information. This ensures that while the questions require domain expertise, they do not provide a manual for harm. In addition to the dataset, the project introduces a state-of-the-art unlearning method called RMU (Representation Misdirection for Unlearning). This technique aims to strip specific dangerous concepts from a model's internal weights while preserving its general-purpose reasoning and language capabilities on standard benchmarks like MMLU. This tool is specifically built for AI safety researchers, model developers, and government institutions tasked with overseeing AI risk management. For developers of open-weight models, WMDP offers a way to harden systems against malicious fine-tuning or jailbreaking. Because the benchmark focuses on unlearning—the permanent removal of knowledge—it provides a more robust defense than traditional refusal-based guardrails, which can often be bypassed. It is also a valuable resource for academic researchers looking to benchmark new safety interventions in a transparent and reproducible manner. What makes WMDP distinct from other safety evaluations is its commitment to transparency and its focus on the proxy approach. While many major AI companies utilize private red-teaming datasets, WMDP is hosted openly on GitHub and HuggingFace. This accessibility encourages broader participation in AI safety research. Furthermore, by focusing on unlearning rather than just refusal, WMDP pushes the industry toward a standard of inherent safety, where models simply do not have the expertise required to assist in high-consequence attacks, regardless of the user's intent or the model's instructions.

Pros & Cons

Provides a public, transparent alternative to private safety evaluations used by major AI labs.

Written by subject matter experts to ensure high-quality and relevant testing data.

Specifically designed to test unlearning, which is more robust than simple refusal training.

Carefully curated to avoid the accidental release of sensitive or export-controlled data.

Maintains model performance on general benchmarks while removing specific hazards.

Limited to multiple-choice question format for evaluating model knowledge.

Requires technical expertise in AI research to implement the associated unlearning methods.

Focuses on proxy knowledge rather than direct measurement of multi-step instructions.

Use Cases

AI researchers can use the RMU method and WMDP dataset to develop models that are inherently safe from malicious repurposing.

Government policy makers can reference the benchmark to set objective safety standards for large-scale AI deployments.

Open-source developers can verify that their fine-tuned models do not inadvertently leak hazardous chemical or biological information.

Platform
Web
Task
knowledge unlearning

Features

cross-model capability testing

huggingface dataset collection

publicly accessible on github

rmu unlearning algorithm

chemical security domain focus

cybersecurity domain focus

biosecurity domain focus

3,668 multiple-choice questions

FAQs

What domains of safety does the WMDP benchmark cover?

The benchmark specifically targets three high-risk areas: biosecurity, cybersecurity, and chemical security. It includes 3,668 multiple-choice questions written by experts to measure knowledge that could aid in malicious attacks.

How does WMDP avoid releasing sensitive or export-controlled information?

The dataset uses proxy questions that involve precursors, neighbors, and components of hazardous knowledge. This allows researchers to evaluate a model's understanding of dangerous concepts without publishing actual blueprints or harmful instructions.

What is the RMU method mentioned in the research?

RMU is a state-of-the-art unlearning method designed to remove specific hazardous knowledge from a model's internal representations. It is used to prove that a model can be made safer while still maintaining its general capabilities on benchmarks like MMLU.

Why is machine unlearning preferred over standard refusal training?

Refusal training can often be bypassed by adversarial attacks or harmful fine-tuning on open-weights. Machine unlearning physically removes the dangerous information from the model, ensuring it cannot be recovered even if the safety guardrails are stripped away.

Pricing Plans

Open Source
Free Plan

Full access to 3,668 questions

RMU unlearning code

GitHub repository access

HuggingFace dataset access

Biosecurity evaluation

Cybersecurity evaluation

Chemical security evaluation

Job Opportunities

There are currently no job postings for this AI tool.

Explore AI Career Opportunities

Social Media

Ratings & Reviews

No ratings available yet. Be the first to rate this tool!

Featured Tools

adly.news favicon
adly.news

Connect with engaged niche audiences or monetize your subscriber base through an automated marketplace featuring verified metrics and secure Stripe payments.

View Details
AI Fruit favicon
AI Fruit

Create viral fruit-eating-fruit ASMR videos for TikTok and YouTube in seconds using advanced AI models like Grok and Kling without any video editing skills.

View Details
DramaPixel favicon
DramaPixel

Streamline your creative workflow by generating professional images, videos, and music in one unified AI workspace designed for marketers and brand designers.

View Details
Frondex favicon
Frondex

Accelerate investment research and strategy with an AI copilot that provides deep industry dives, market trend analysis, and seamless tool integrations for investors.

View Details
Atomic Mail favicon
Atomic Mail

Protect your data with end-to-end encryption and an AI suite that drafts, summarizes, and scans emails for sensitive content to ensure maximum privacy.

View Details
Rekap favicon
Rekap

Turn every meeting, call, and document into actionable takeaways with AI-powered transcription and custom automation tools designed for fast-moving teams.

View Details
Sketch To favicon
Sketch To

Convert images into artistic sketches or transform hand-drawn drafts into realistic photos using advanced AI models designed for artists, designers, and hobbyists.

View Details