This is my copy of the Alignment Research Engineer Accelerator (ARENA) program. The github repository for the original curriculum can be found here.
The table below lists tasks that are done during the ~12 hours co-working sessions that we host each week. Most weeks, 70% of the time is spent on the ARENA subchapters (or occasionally, independent project work), while the remaining time is spent on paper readings & discussions or conceptual reviews.
| Week of 2025 | Progress Updates |
|---|---|
| 5 | Linear Algebra review (rank, singular value decomposition, eigenv*s) |
| 6 | Deep learning review, Chapter 0.0 |
| 7 | Chapter 0.0, Chapter 0.1, guided discussion on different research areas within AI Safety and research directions outlined by Anthropic's alignment team |
| 8 | Chapter 0.1, outlining how evaluations are conducted, discussing METR's bug bounty program and desiradata, and compiling list of bugs that could be used as evals, Chapter 0.2 |
| 9 | (Post-conference off week) |
| 10 | Chapter 0.2 and meta-tasks (Asana task list for remaining chapters to be completed and compiling list of papers to read |
| 11 | Chapter 0.2, Chapter 0.3, these blog posts from METR |
| 12 | Chapter 0.3, Chapter 0.4, read and gave a short presentation on Hinton et al. (2015), read and took notes on the RE-bench paper |
| 13 | Chapter 0.5, autoencoders and VAE math, Ying et al. (2025), Situational Awareness Dataset paper, Sleeper agents paper discussion |
| 14 | Chapter 3.1; brief discussions on Kingma & Ba (2017), We need a Science of Evals, & threat modeling |
| 15 | Chapter 3.1, exploring jobagent, notes on UMAP and t-SNE |
| 16 | Chapter 3.2, worked on minor extensions to the SAD paper, MATS applications |
Figure 1. Asana timeline for the completing the ARENA curriculum. The timeline was
planned with one key assumption: that it takes ~4 hours to complete each subchapter when
timed. We have 9 hours of programming meeting per week, which allows us to complete
around two subchapters each week. Note that subchapters 1.2 - 1.5 have been omitted
from this timeline as evaluations and reinforcement learning are larger priorities
than mechanistic interpretability, but some portions from that chapter will be
done after completing Chapter 2.4: RLHF after the 5th of May, 2025.