AI Tech Suite

Trojan Detection Challenge 2023 (LLM Edition)

Click to visit website

About

The Trojan Detection Challenge 2023 (LLM Edition) is a NeurIPS 2023 competition focused on advancing methods for detecting hidden functionality in large language models (LLMs). It features two tracks: Trojan Detection (identifying triggers for hidden behaviors) and Red Teaming (developing automated methods to elicit undesirable behaviors). The challenge aims to improve LLM safety by uncovering jailbreaks and hidden functionalities. Participants can contribute to a safer AI landscape by designing robust trojan detectors and automated red teaming methods. Prizes and publication opportunities are available for winning teams.

Features

• large language models (llms)

• open competition format encouraging method sharing

• neural trojan attacks

• jailbreak detection

• hidden functionality detection

• automated red teaming methods

• red teaming track

• trojan detection track

FAQs

What are the current rules?

[Here](index.html#rules).

Can the organizers change the rules?

Yes. We require participants to consent to a change of rules if there is an urgent need. This is a new area and unanticipated developments may make it necessary for us to change the rules.

How do I contact the organizers?

Please feel free to contact us at [tdc2023-organizers@googlegroups.com](mailto:tdc2023-organizers@googlegroups.com).

Who can participate in the competition?

The competition is open to the public. Anyone can participate.

When is the deadline to register?

You can register for any track at any time during the competition.

How many people can I have in my team?

Teams can have any number of members. Solo teams are allowed.

Where can I download data and submit results?

See the [Getting Started](start.html) page.

How many submissions can each team enter per competition track?

In each track, teams are restricted to 5 submissions per day in the validation phase. In the test phase, teams are restricted to 5 submissions total. Only one account per team can be used to submit results. Creating multiple accounts to circumvent the submission limits will result in disqualification.

Are participants required to share the details of their method?

We encourage all participants to share their methods and code, either with the organizers or publicly. To be eligible for prizes, winning teams are required to share their methods, code, and models with the organizers.

What are the details for the Trojan Detection Track?

[Here](tracks.html#trojan-detection).

What are the details for the Red Teaming Track?

[Here](tracks.html#red-teaming).

Why are you using the baselines you have chosen?

Our baselines (PEZ, GBDA, UAT, Zero-Shot) are well-known text optimization and red teaming from the academic literature, which can be used for our trojan detection and red teaming tasks.

Why are you using the LLMs you have chosen?

For the Trojan Detection Track, we use models from the Pythia suite of LLMs, which are open-source. This enables broader participation compared to models that are not fully open-source. We also use different-sized models in the Base Model and Large Model subtracks, ranging from ~1B to ~10B parameters. This allows groups with a range of compute resources to participate. For the Red Teaming Track, we use Llama-2-chat models. These models are also open-source, and in testing we found them to be very robust to the baseline red teaming methods.

Why are you using the particular trojan attack you have chosen?

We use the simplest possible trojan attack on LLMs, where using the trigger as a prompt on its own causes the LLM to generate the target string. Existing trojan attacks for text models often consider triggers that modify clean inputs in various ways. We chose this simpler setting due to its strong resemblance to the red teaming task we consider, as part of the goal of this competition is to foster connections between the trojan detection and red teaming communities.

Is it "trojans" or "Trojans"?

Both are used in the academic literature. In the 2022 competition, we used "Trojans". However, this can make sentences a bit messy if one is using the word often, so we are using "trojans" for this competition.

Job Opportunities

There are currently no job postings for this AI tool.

Explore AI Career Opportunities

Social Media

Ratings & Reviews

No ratings available yet. Be the first to rate this tool!

Alternatives

ShieldForce

ShieldForce: AI-powered cybersecurity for businesses. Protection against ransomware, advanced email security, automated disaster recovery, and training.

View Details

Featured Tools

Songmeaning

Songmeaning uses AI to reveal the stories and meanings behind song lyrics. It offers lyric translation and AI music generation.

View Details

Whisper Notes

Offline AI speech-to-text transcription app using Whisper AI. Supports 80+ languages, audio file import, and offers lifetime access with a one-time purchase. Available for iOS and macOS.

View Details

GitGab

Connects Github repos and local files to AI models (ChatGPT, Claude, Gemini) for coding tasks like implementing features, finding bugs, writing docs, and optimization.

View Details

nuptials.ai

nuptials.ai is an AI wedding planning partner, offering timeline planning, budget optimization, vendor matching, and a 24/7 planning assistant to help plan your perfect day.

View Details

Make-A-Craft

Make-A-Craft helps you discover craft ideas tailored to your child's age and interests, using materials you already have at home.

View Details

Pixelfox AI

Free online AI photo editor with comprehensive tools for image, face/body, and text. Features include background/object removal, upscaling, face swap, and AI image generation. No sign-up needed, unlimited use for free, fast results.

View Details

Smart Cookie Trivia

Smart Cookie Trivia is a platform offering a wide variety of trivia questions across numerous categories to help users play trivia, explore different topics, and expand their knowledge.

View Details

Code2Docs

AI-powered code documentation generator. Integrates with GitHub. Automates creation of usage guides, API docs, and testing instructions.

View Details

Trojan Detection Challenge 2023 (LLM Edition)

Click to visit website

About

Platform

Keywords

Task

Features

FAQs

What are the current rules?

Can the organizers change the rules?

How do I contact the organizers?

Who can participate in the competition?

When is the deadline to register?

How many people can I have in my team?

Where can I download data and submit results?

How many submissions can each team enter per competition track?

Are participants required to share the details of their method?

What are the details for the Trojan Detection Track?

What are the details for the Red Teaming Track?

Why are you using the baselines you have chosen?

Why are you using the LLMs you have chosen?

Why are you using the particular trojan attack you have chosen?

Is it "trojans" or "Trojans"?

Job Opportunities

Social Media

Ratings & Reviews

Alternatives

ShieldForce

Featured Tools

Songmeaning

Whisper Notes

GitGab

nuptials.ai

Make-A-Craft

Pixelfox AI

Smart Cookie Trivia

Code2Docs