Core implementation of Paper:
Topic-FlipRAG: Topic-Orientated Adversarial Opinion Manipulation Attacks to Retrieval-Augmented Generation Models
This repository contains the full implementation of Topic-FlipRAG, a novel black-box adversarial attack framework against Retrieval-Augmented Generation (RAG) systems. By leveraging general language knowledge and reverse-gradient signals, it optimizes a small number of poisoned documents to effectively flip the opinion stance of the RAG system across an entire set of topic-related queries.
-
Stage1_knowledge_guided_attack.ipynb
Includes the core implementation of the knowledge-guided attack, which leverages LLM-inferred general knowledge to perform multi-granularity document modifications (doc_knowgeneration). -
Stage2_adversarial_trigger_generation.ipynb
Optimizes minimal triggers to attach todoc_knowfor final poisoned documents. Includes formatting scripts for downstream poisoning tasks. -
RAG_pipeline.ipynb
Builds a full RAG system (retriever + database + LLM) and evaluates poisoning effects. Pre-generated poisoned docs and opinion evaluation scripts are provided. -
Data
PROCON_data.json: The opinion dataset used in the paper.- Example poisoned documents:
data/example_adversarial_docs/Topic-FlipRAG_society_CON_passges/— used inRAG_pipeline.ipynb. - Example
doc_knowfile:data/example_adversarial_docs/know_attack_data_3_0.json— used inStage2_adversarial_trigger_generation.ipynbto demonstrate the trigger generation process.
This project is Colab-friendly. You only need to replace paths in the Jupyter notebooks to point to the corresponding files in the data/ directory. OpenAI API is required for Stage1_knowledge_guided_attack.ipynb and the RAG_pipeline.ipynb.
-
Stage 2 – Adversarial Trigger Generation
➤ Optimizes adversarial triggers based on Stage 1 outputs.
⮕ To skip entire Stage 1 procession, directly set: (This is a pre-generated example for fast evaluation.)path_know = 'data/example_adversarial_docs/know_attack_data_3_0.json'
💡 Recommended GPU: T4
-
RAG Pipeline – Execution & Evaluation
➤ Runs the full RAG system and evaluates poisoned document impact.
⮕ To skip Stage 2, replaceresult_pathinload_data()with a sample file from:data/example_adversarial_docs/Topic-FlipRAG_society_CON_passges/(These are pre-generated adversarial examples in the society domain targeting the CON stance.)
💡 Recommended GPU: A100
🔁 Use Google Drive for hosting large poisoned files if needed.
To facilitate quick testing, we provide a subset of poisoned documents located in data/Topic-FlipRAG_society_CON_passges/, specifically targeting the "Society & Culture" domain with a CON (oppose) stance. For full-scale evaluation, you can modify the code to load the entire dataset from PROCON_data.json.
If you find this work useful, please cite:
@article{gong2025topic,
title={Topic-FlipRAG: Topic-Orientated Adversarial Opinion Manipulation Attacks to Retrieval-Augmented Generation Models},
author={Gong, Yuyang and Chen, Zhuo and Chen, Miaokun and Yu, Fengchang and Lu, Wei and Wang, Xiaofeng and Liu, Xiaozhong and Liu, Jiawei},
journal={arXiv preprint arXiv:2502.01386},
year={2025}
}