Repository files navigation
Evaluating Large Language Models: A Comprehensive Survey [Paper]
Chain-of-Verification Reduces Hallucination in Large Language Models [Paper]
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models [Paper]
LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples [Paper]
Large Language Models Can Be Good Privacy Protection Learners [Paper]
ProPILE: Probing Privacy Leakage in Large Language Models [Paper]
Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study [Paper]
Jailbroken: How Does LLM Safety Training Fail? [Paper]
MASTERKEY: Automated Jailbreaking of Large Language Model Chatbots [Paper]
Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization [Paper]
Defending ChatGPT against Jailbreak Attack via Self-Reminder [Paper]
Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM [Paper]
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts [Paper] [Code]
Latent Jailbreak: A Benchmark for Evaluating Text Safety and Output Robustness of Large Language Models [Paper]
Prompts Should not be Seen as Secrets: Systematically Measuring Prompt Extraction Attack Success [Paper]
Multi-step Jailbreaking Privacy Attacks on ChatGPT [Paper]
A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily [Paper]
DeepInception: Hypnotize Large Language Model to Be Jailbreaker [Paper]
Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation [Paper]
Multilingual Jailbreak Challenges in Large Language Models [Paper]
Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations [Paper]
AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models [Paper]
Open Sesame! Universal Black Box Jailbreaking of Large Language Models [Paper]
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks [Paper]
Tricking LLMs into Disobedience: Understanding, Analyzing, and Preventing Jailbreaks [Paper]
Universal and Transferable Adversarial Attacks on Aligned Language Models [Paper] [Code]
Jailbreak Chat
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
You can’t perform that action at this time.