mech-interp

Implementation and analysis of Sparse Autoencoders for neural network interpretability research. Features interactive visualization dashboard and W&B integration.

sparse-autoencoders interpretability activation-functions neuron-activity wandb transformerlens mech-interp

Updated Nov 21, 2025
Python

coderinblack08 / prompt-helmet

Star

ai cybersecurity mech-interp

Updated Apr 6, 2025
Jupyter Notebook

humanjesse / Granite4-Mamba2-Mech-Interp-Suite

Star

mech-interp suite for Granite4 models that use Mamba-2 architecture

ibm mech-interp mamba-2 granite4

Updated Feb 21, 2026
Python

ashioyajotham / ai_research

Star

Collection and learnings of my journey in Artificial Intelligence

reinforcement-learning alignment ai-safety reasoning red-teaming evals mech-interp

Updated Mar 20, 2026
Jupyter Notebook

maty-bohacek / competency-gaps

Star

Official implementation of the 'Uncovering Competency Gaps in Large Language Models and Their Benchmarks' paper

benchmarks llms evals mech-interp

Updated Dec 6, 2025

keidolabs / affect-reception

Star

All code, stimuli, and results for a mechanistic interpretability study investigating how large language models internally represent emotional content

ai-psychology mech-interp

Updated Mar 17, 2026
Python

ymgw55 / repro-superposition

Star

Unofficial implementation to reproduce the experiments from "Superposition as a Phase Change" of "Toy Models of Superposition".

python neural-network reproducible-research circuit interpretability llm anthropic mech-interp

Updated Aug 17, 2025
Jupyter Notebook

Improve this page

Add a description, image, and links to the mech-interp topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the mech-interp topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mech-interp

Here are 11 public repositories matching this topic...

codelion / pts

kureha-yamaguchi / reasoning-manipulation

ashioyajotham / cot-faithfulness-mech-interp

1289nav / Exploring-chain-of-thought-reasoning-in-LLMs

ashioyajotham / exploring_saes

coderinblack08 / prompt-helmet

humanjesse / Granite4-Mamba2-Mech-Interp-Suite

ashioyajotham / ai_research

maty-bohacek / competency-gaps

keidolabs / affect-reception

ymgw55 / repro-superposition

Improve this page

Add this topic to your repo