Runtime control of LLM agent behaviors through activation steering vectors. More calibrated than prompting.
-
Updated
Dec 19, 2025 - Python
Runtime control of LLM agent behaviors through activation steering vectors. More calibrated than prompting.
Behavioral auditing & repair toolkit for LLMs. Measures 8 dimensions via confidence probes.
Iterative Sparse Matrix Steering: Closed-Form Subspace Alignment for Multi-Layer LLM Control (No SGD required).
Official implementation of "Beyond Multiple Choice: Evaluating Steering Vectors for Summarization" (Findings of EACL 2026).
Repository for paper "Understanding (Un)Reliability of Steering Vectors in Language Models" by Joschka Braun, Carsten Eickhoff, David Krueger, Seyed Ali Bahrainian, Dmitrii Krasheninnikov.
Qwen3-0.6B activation steering: style vectors, lens contamination eval, CPRR methodology
Investigating honesty, deception and steering in large language models. Replicating and extending the MASK honesty benchmark on frontier models, working toward internal representation analysis and activation steering for honesty.
Add a description, image, and links to the steering-vectors topic page so that developers can more easily learn about it.
To associate your repository with the steering-vectors topic, visit your repo's landing page and select "manage topics."