Trojan Detection Challenge 2023 (LLM Edition) favicon

Trojan Detection Challenge 2023 (LLM Edition)

Trojan Detection Challenge 2023 (LLM Edition) screenshot
Click to visit website
Feature this AI

About

The Trojan Detection Challenge 2023 (LLM Edition) is a NeurIPS 2023 competition focused on advancing methods for detecting hidden functionality in large language models (LLMs). It features two tracks: Trojan Detection (identifying triggers for hidden behaviors) and Red Teaming (developing automated methods to elicit undesirable behaviors). The challenge aims to improve LLM safety by uncovering jailbreaks and hidden functionalities. Participants can contribute to a safer AI landscape by designing robust trojan detectors and automated red teaming methods. Prizes and publication opportunities are available for winning teams.

Platform
Web
Keywords
llmlarge language modelsred teamingtrojan detection
Task
threat detecting

Features

large language models (llms)

open competition format encouraging method sharing

neural trojan attacks

jailbreak detection

hidden functionality detection

automated red teaming methods

red teaming track

trojan detection track

FAQs

What are the current rules?

[Here](index.html#rules).

Can the organizers change the rules?

Yes. We require participants to consent to a change of rules if there is an urgent need. This is a new area and unanticipated developments may make it necessary for us to change the rules.

How do I contact the organizers?

Please feel free to contact us at [tdc2023-organizers@googlegroups.com](mailto:tdc2023-organizers@googlegroups.com).

Who can participate in the competition?

The competition is open to the public. Anyone can participate.

When is the deadline to register?

You can register for any track at any time during the competition.

How many people can I have in my team?

Teams can have any number of members. Solo teams are allowed.

Where can I download data and submit results?

See the [Getting Started](start.html) page.

How many submissions can each team enter per competition track?

In each track, teams are restricted to 5 submissions per day in the validation phase. In the test phase, teams are restricted to 5 submissions total. Only one account per team can be used to submit results. Creating multiple accounts to circumvent the submission limits will result in disqualification.

Are participants required to share the details of their method?

We encourage all participants to share their methods and code, either with the organizers or publicly. To be eligible for prizes, winning teams are required to share their methods, code, and models with the organizers.

What are the details for the Trojan Detection Track?

[Here](tracks.html#trojan-detection).

What are the details for the Red Teaming Track?

[Here](tracks.html#red-teaming).

Why are you using the baselines you have chosen?

Our baselines (PEZ, GBDA, UAT, Zero-Shot) are well-known text optimization and red teaming from the academic literature, which can be used for our trojan detection and red teaming tasks.

Why are you using the LLMs you have chosen?

For the Trojan Detection Track, we use models from the Pythia suite of LLMs, which are open-source. This enables broader participation compared to models that are not fully open-source. We also use different-sized models in the Base Model and Large Model subtracks, ranging from ~1B to ~10B parameters. This allows groups with a range of compute resources to participate. For the Red Teaming Track, we use Llama-2-chat models. These models are also open-source, and in testing we found them to be very robust to the baseline red teaming methods.

Why are you using the particular trojan attack you have chosen?

We use the simplest possible trojan attack on LLMs, where using the trigger as a prompt on its own causes the LLM to generate the target string. Existing trojan attacks for text models often consider triggers that modify clean inputs in various ways. We chose this simpler setting due to its strong resemblance to the red teaming task we consider, as part of the goal of this competition is to foster connections between the trojan detection and red teaming communities.

Is it "trojans" or "Trojans"?

Both are used in the academic literature. In the 2022 competition, we used "Trojans". However, this can make sentences a bit messy if one is using the word often, so we are using "trojans" for this competition.

Job Opportunities

There are currently no job postings for this AI tool.

Explore AI Career Opportunities

Social Media

discord

Ratings & Reviews

No ratings available yet. Be the first to rate this tool!

Alternatives

ShieldForce favicon
ShieldForce

ShieldForce: AI-powered cybersecurity for businesses. Protection against ransomware, advanced email security, automated disaster recovery, and training.

View Details

Featured Tools

Songmeaning favicon
Songmeaning

Songmeaning uses AI to reveal the stories and meanings behind song lyrics. It offers lyric translation and AI music generation.

View Details
Whisper Notes favicon
Whisper Notes

Offline AI speech-to-text transcription app using Whisper AI. Supports 80+ languages, audio file import, and offers lifetime access with a one-time purchase. Available for iOS and macOS.

View Details
GitGab favicon
GitGab

Connects Github repos and local files to AI models (ChatGPT, Claude, Gemini) for coding tasks like implementing features, finding bugs, writing docs, and optimization.

View Details
nuptials.ai favicon
nuptials.ai

nuptials.ai is an AI wedding planning partner, offering timeline planning, budget optimization, vendor matching, and a 24/7 planning assistant to help plan your perfect day.

View Details
Make-A-Craft favicon
Make-A-Craft

Make-A-Craft helps you discover craft ideas tailored to your child's age and interests, using materials you already have at home.

View Details
Pixelfox AI favicon
Pixelfox AI

Free online AI photo editor with comprehensive tools for image, face/body, and text. Features include background/object removal, upscaling, face swap, and AI image generation. No sign-up needed, unlimited use for free, fast results.

View Details
Smart Cookie Trivia favicon
Smart Cookie Trivia

Smart Cookie Trivia is a platform offering a wide variety of trivia questions across numerous categories to help users play trivia, explore different topics, and expand their knowledge.

View Details
Code2Docs favicon
Code2Docs

AI-powered code documentation generator. Integrates with GitHub. Automates creation of usage guides, API docs, and testing instructions.

View Details