Dataset 1: RealToxicityPrompts [Dataset Link] | [Paper]
Notes:
- The entire codebase for
ROMEcan be found in the directoryrome-main. Cloned from Locating and Editing Factual Associations in GPT (NeurIPS'22). - The
rome-main/trace_main.pyis the main script to run a vanilla example of causal tracing on one of the datasets in the original paper. - The directory
rome-main/dsetscontains the datasets they use. This is where we need to add theRealToxicityPromptsdataset and load it from for inference. It is present in the filerome-main/dsets/realtoxicityprompts.py
For William:
- Refer to the
RealToxicityPromptspaper to determine which model they used and pick one of the GPT-2 variants to runROMEwith the prompts from the dataset. - Choose the "challenging" subset of prompts, i.e., where
dataset["challenging"] == true - For our use case, right now just the causal tracing part is enough, we don't need to worry about the editing part yet.