Pivotal Token Search
-
Updated
Dec 20, 2025 - Python
Pivotal Token Search
Adversarial Manipulation of CoT
Mechanistic analysis of Chain-of-Thought faithfulness using GPT-2 Small
Analysed determinism, faithfulness, reasoning patterns, & steering. Developed and tested methods to enhance control and fail-safes
Implementation and analysis of Sparse Autoencoders for neural network interpretability research. Features interactive visualization dashboard and W&B integration.
mech-interp suite for Granite4 models that use Mamba-2 architecture
Collection and learnings of my journey in Artificial Intelligence
Official implementation of the 'Uncovering Competency Gaps in Large Language Models and Their Benchmarks' paper
All code, stimuli, and results for a mechanistic interpretability study investigating how large language models internally represent emotional content
Unofficial implementation to reproduce the experiments from "Superposition as a Phase Change" of "Toy Models of Superposition".
Add a description, image, and links to the mech-interp topic page so that developers can more easily learn about it.
To associate your repository with the mech-interp topic, visit your repo's landing page and select "manage topics."