A Python backend for exploring GPT-2's internal representations through mechanistic interpretability. Built with TransformerLens and Flask, this project provides API endpoints that let a visual frontend inspect how a language model processes text, token by token, layer by layer.
This backend powers an interactive visualization tool that lets users:
- Tokenize text and see how GPT-2 breaks it into subword tokens
- Trace token embeddings through all 12 layers of GPT-2, reduced to 2D via PCA
- Inspect attention patterns — which tokens attend to which, at any layer and head
- Apply the logit lens — see what the model would predict at each intermediate layer, revealing how meaning builds up through the network
interpretative-interfaces/
├── app.py # Flask server with all API endpoints
├── model_utils.py # Core functions: tokenize, trace, attention, predict
├── requirements.txt # Pinned Python dependencies
├── notebooks/
│ ├── tutorial-walkthrough.ipynb # Annotated TransformerLens tutorial
│ ├── tokenization.ipynb # Tokenization experiments
│ ├── embedding_extraction.ipynb # Layer-by-layer embedding extraction
│ └── dimensionality_reduction.ipynb # PCA/UMAP reduction + trajectory plots
├── examples/ # Saved JSON responses for frontend mock data
│ ├── example1_tokenize.json
│ ├── example1_trace.json
│ ├── example1_attention.json
│ ├── example1_predict.json
│ └── ...
└── API.md # Full endpoint documentation
- Python 3.10+
- ~2 GB disk space for the GPT-2 model (downloaded on first run)
git clone https://github.com/YOUR_USERNAME/interpretative-interfaces.git
cd interpretative-interfaces
python -m venv venv
source venv/bin/activate # on Windows: venv\Scripts\activate
pip install -r requirements.txtpython app.pyThe server starts on http://localhost:5001. The first run will download GPT-2 small (~500 MB).
Break text into GPT-2 tokens.
curl -X POST http://localhost:5001/tokenize \
-H "Content-Type: application/json" \
-d '{"text": "The cat sat on the mat"}'Get 2D trajectory of selected tokens through all 12 layers (PCA-reduced).
curl -X POST http://localhost:5001/trace \
-H "Content-Type: application/json" \
-d '{"text": "The cat sat on the mat", "token_indices": [1, 4]}'Get the attention matrix for a specific layer and head.
curl -X POST http://localhost:5001/attention \
-H "Content-Type: application/json" \
-d '{"text": "The cat sat on the mat", "layer": 5, "head": 3}'Apply the logit lens: see top-5 predicted tokens at each layer for a given position.
curl -X POST http://localhost:5001/predict \
-H "Content-Type: application/json" \
-d '{"text": "The cat sat on the mat", "token_index": 5}'See API.md for full request/response schemas.
- TransformerLens — hooks into GPT-2 internals (activations, attention patterns, residual stream)
- Flask — lightweight API server
- scikit-learn — PCA for dimensionality reduction
- NumPy — tensor/array manipulation
- How-to Transformer Mechanistic Interpretability in 50 Lines
- Interpreting GPT: The Logit Lens
- TransformerLens Getting Started
MIT