Skip to content

YanhaoLi-Cc/LLM-Security-Paper-List

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 

Repository files navigation

LLM Safety Paper List

Survey

  1. Evaluating Large Language Models: A Comprehensive Survey [Paper]

Hallucination

  1. Chain-of-Verification Reduces Hallucination in Large Language Models [Paper]
  2. DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models [Paper]
  3. LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples [Paper]

Privacy Protection

  1. Large Language Models Can Be Good Privacy Protection Learners [Paper]
  2. ProPILE: Probing Privacy Leakage in Large Language Models [Paper]

Jailbreak

  1. Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study [Paper]
  2. Jailbroken: How Does LLM Safety Training Fail? [Paper]
  3. MASTERKEY: Automated Jailbreaking of Large Language Model Chatbots [Paper]
  4. Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization [Paper]
  5. Defending ChatGPT against Jailbreak Attack via Self-Reminder [Paper]
  6. Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM [Paper]
  7. GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts [Paper] [Code]
  8. Latent Jailbreak: A Benchmark for Evaluating Text Safety and Output Robustness of Large Language Models [Paper]
  9. Prompts Should not be Seen as Secrets: Systematically Measuring Prompt Extraction Attack Success [Paper]
  10. Multi-step Jailbreaking Privacy Attacks on ChatGPT [Paper]
  11. A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily [Paper]
  12. DeepInception: Hypnotize Large Language Model to Be Jailbreaker [Paper]
  13. Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation [Paper]
  14. Multilingual Jailbreak Challenges in Large Language Models [Paper]
  15. Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations [Paper]
  16. AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models [Paper]
  17. Open Sesame! Universal Black Box Jailbreaking of Large Language Models [Paper]
  18. SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks [Paper]
  19. Tricking LLMs into Disobedience: Understanding, Analyzing, and Preventing Jailbreaks [Paper]
  20. Universal and Transferable Adversarial Attacks on Aligned Language Models [Paper] [Code]

Datasets

  1. Jailbreak Chat
Star History Chart

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors