Performed head-level interpretability analysis on Transformer models using masking experiments. Evaluated attention head contribution through accuracy and logit-based metrics (91% baseline accuracy).
natural-language-processing deep-learning multi-head-attention model-interpretability performance-evaluation-metrics tranformer-architecture self-attention-mechansim attention-head-specialization masking-experiments model-robustness-analysis
-
Updated
Feb 21, 2026 - Jupyter Notebook