Activault is an activation data engine that dramatically reduces costs for training interpreter models on frontier LLMs:
- Collects and stores model activations (the model's "mental state") efficiently using S3 object storage, reducing activation management costs by 4-8x.
- Enables reproducible and shareable interpretability research through standardized object storage.
- Maintains peak efficiency and throughput while handling petabyte-scale activation datasets.
You can read about Activault in our blog post.
β οΈ CRITICAL WARNING
Streaming/storing activations with Activault can be expensive ($$$) and slow if care is not taken before launching large-scale jobs. We recommend users set up their compute environment in the same region/data center as their s3 solution to ensure minimal latency and avoid egress fees. We also strongly recommend users consult the pricing page for their s3 solution to ensure they understand the costs associated with their jobs.
When designing Activault, we considered the tradeoffs between computing activations on-the-fly vs storing them on disk vs storing them in S3. Here's how the approaches compare:
| Aspect | On-the-fly | Local Cache | Naive S3 | Activault |
|---|---|---|---|---|
| Setup Complexity | β Easy | β Easy | β Hard | β Easy |
| Write Performance | β Fast | β Fast | β Slow | β Fast |
| Read Performance | β Fast | β Fast | β Slow | β Fast |
| Efficiency | β Enormously inefficient as activations must be regenerated across runs | β Efficient | β Efficient | β Efficient |
| Reproducibility | β Poor | β Good | β Guaranteed | β Guaranteed |
| Token Context | β Autointerp requires recomputing | β Good | β Poor (no tokens saved) | β Tokens saved with data |
| Shareability | β Vanishes after training | β Terrible | β Guaranteed | β Guaranteed |
| Storage Cost | β None | β Very expensive | β Cheap | β Cheap |
| Storage Availability | β N/A | β Very low | β High | β High |
- π§ Setup - Installation and AWS credential configuration
- π Collecting Activations - Core pipeline for gathering model activations
- π Running Collection Jobs
- Using Slurm - Run on HPC clusters
- Using Ray - Distributed computing setup
- Running Locally - Single machine execution
- π Checking the Outputs: S3 Shell - Tools for inspecting collected data
- π Using Activations with RCache - Efficient streaming interface
- β FAQs - Common questions and answers
- πΎ Local vs S3 Storage - Storage approach comparisons
- π₯ Credits - Attribution and inspiration
pip install uv
uv sync --no-build-isolation
uv pip install -e .
Make sure your AWS credentials are set.
export AWS_ACCESS_KEY_ID=<your_key>
export AWS_SECRET_ACCESS_KEY=<your_secret>
export S3_ENDPOINT_URL=<your_endpoint_url>
Use one of the pre-existing configs in configs/ or create your own. We provide configs for several frontier open-weight models out-of-box.
The collection pipeline:
- Loads a transformer model and hooks into the specified layers and modules
- Streams text data according to a specified data_key (mappings defined in
pipeline/data/datasets.json) through the model in batches - For each hook (e.g., residual stream, attention outputs):
- Collects activations and their corresponding input tokens
- Concatenates multiple batches into "megabatch" files
- Computes running statistics (mean, std, norm)
- Uploads to S3 asynchronously
Each hook's data is stored in its own directory:
s3://{bucket}/{run_name}/
βββ cfg.json # Collection config and model info
βββ {hook_name}/
βββ metadata.json # Shape and dtype info
βββ statistics.json # Running statistics
βββ {uuid}--{n}.pt # Megabatch files
π’ IMPORTANT
Ensuren_runsin the config file is set to the total number of runs you want to launch before runnign large-scale distributed jobs. if this is not done, you will generate redundant data.
For a simple job on a single Slurm node:
sbatch scripts/collect.slurm configs/your_config.yamlTo run multiple distributed jobs across different nodes:
./scripts/run_slurm_jobs.sh configs/your_config.yaml 8 0 7Key Arguments:
configs/your_config.yaml: Path to configuration file8: Total number of workers to spawn0 7: Start and end indices for worker assignment (will launch jobs for indices 0-7)
The script will generate a log file mapping machine indices to Slurm job IDs.
Slurm job parameters (CPUs, GPUs, memory, etc.) can be adjusted by editing scripts/collect.slurm. Important parameters:
#SBATCH --cpus-per-task=16 # CPUs per task
#SBATCH --gres=gpu:1 # GPUs per node
#SBATCH --mem=250G # Memory per nodeBe sure to start a Ray cluster.
# Start Ray locally
ray start --head
# On head node
ray start --head --port=6379
# On worker nodes
ray start --address=<head-node-ip>:6379Running a single worker:
python scripts/run_ray_jobs.py configs/your_config.yaml 1 0 0 --resources '{"CPU": 32, "GPU": 2}' --waitRunning distributed jobs (8 workers from index 0-7):
python scripts/run_ray_jobs.py configs/your_config.yaml 8 0 7 --resources '{"CPU": 32, "GPU": 2}' --waitKey Arguments:
configs/your_config.yaml: Path to configuration file8: Total number of workers to spawn0 7: Start and end indices for worker assignment--resources: CPU and GPU allocation per worker (JSON format)--address: Optional Ray cluster address (if not using environment variable)--wait: Wait for all jobs to complete and show results
Check Ray's dashboard periodically (typically at http://localhost:8265) for cluster status.
To run the pipeline locally, you can use the Activault CLI:
activault collect --config configs/your_config.yamlAlternatively, you can run it directly:
python stash.py --config configs/your_config.yamlFor distributed execution, specify the machine index:
activault collect --config configs/your_config.yaml --machine 0After running the pipeline, you can check the outputs by using our S3 shell.
First, make sure your S3 bucket name is set:
export S3_BUCKET_NAME=<your_bucket>Then, launch the S3 shell using the Activault CLI:
activault s3In the S3 shell, navigate to your run directory and use these commands:
ls- List files and directoriescd directory_name- Change directoryfilecount- Count the number of files in the current directory and subdirectoriessizecheck- Calculate the total size of files in the current directoryinspect <file_index>- Inspect a specific megabatch file
Example inspection output:
s3://main/testing/models.layers.24.mlp.post> inspect 1
Inspecting file: /tmp/0f909221-ff28-4a94-a43f-cfe973e835cf--5_0.saved.pt
PT File Inspection:
----------------------------------------
Model: meta-llama/Llama-3.3-70B-Instruct
Tensor Shapes:
states: [32, 2048, 8192]
input_ids: [32, 2048]
States Tensor Check:
No NaNs: β
No Infs: β
Value range: [-6.941, 4.027]
First 4 batches (first 250 chars each):
----------------------------------------
Batch 0: Given a triangle... (truncated)
Batch 1: Neonatal reviewers indicated... (truncated)
Batch 2: Is there a method... (truncated)
Batch 3: John visits three different... (truncated)
Enter batch number (0-31) to view full text, or 'q' to quit:
RCache provides a simple interface for efficiently streaming large activation datasets from S3 without memory or I/O bottlenecks.
- RCache maintains a small buffer (default: 2 files) in memory
- While you process the current megabatch, the next ones are downloaded asynchronously
- After a brief initial load (<30s), processing should never be bottlenecked by the downloads/streamings
cache = S3RCache.from_credentials(
aws_access_key_id=os.environ["AWS_ACCESS_KEY_ID"],
aws_secret_access_key=os.environ["AWS_SECRET_ACCESS_KEY"],
s3_prefix="run_name/hook_name",
device="cuda", # or "cpu"
return_ids=True # if you need the input tokens
)
for batch in cache:
states = batch["states"] # shape: [n_batches, seq_len, d_model]
input_ids = batch["input_ids"] # shape: [n_batches, seq_len]
# ... process batch ...
cache.finalize() # clean upSee retrieve.py for a complete example.
Yes! This is the intended goal. RCache can be used out of box in SAE training workflows. It supports blazing fast throughput to ensure training is always FLOP-bottlencked, not IO-bottlencked.
Activault is designed to be compatible with any S3-style object storage solution. We performed most of our testing on Nebius S3 and have also tested on AWS S3. It is possible that other platforms may encounter issues, and we welcome contributions to expand support.
A few reasons:
- The main reason is that the bottleneck is upload speed not throughput. We experimented with using much faster internal serving engines but the main process ran far ahead of the save processes and there was no real gain in overall time.
- Activault does not use the
generatemethod and prefill speeds are more comparable between the public libraries. - Activault should be compatible with as many models as possible.
vllmdoes not play nice with procuring internal states.
That said, we welcome contributions to expand Activault's support for more efficient inference libraries.
We do not use libraries such as nnsight or transformer-lens to minimize dependencies and potential failure points, and to ensure maximal compatibility with a wide range of models.
We welcome contributions! Please open an issue or PR. We are releasing Activault as a community tool to enable low-resource users to collect activations, run experiments, and share data to analyze frontier open-weight models.
This repo was originally inspired by Lewington-pitsos/sache, which is linked in the LessWrong post here.
Activault is licensed under the Apache License 2.0.
This is a permissive license that allows you to:
- Use the code commercially
- Modify the code
- Distribute your modifications
- Use patent claims of contributors (if applicable)
- Sublicense and/or distribute the code
Key requirements:
- Include a copy of the license in any redistribution
- Clearly mark any changes you make to the code
- Include the original copyright notices
The full text of the license is available in the LICENSE file.