Skip to content

AsteroidHunter/solvingARENA

Repository files navigation

This is my copy of the Alignment Research Engineer Accelerator (ARENA) program. The github repository for the original curriculum can be found here.

Progress Log

The table below lists tasks that are done during the ~12 hours co-working sessions that we host each week. Most weeks, 70% of the time is spent on the ARENA subchapters (or occasionally, independent project work), while the remaining time is spent on paper readings & discussions or conceptual reviews.

Week of 2025 Progress Updates
5 Linear Algebra review (rank, singular value decomposition, eigenv*s)
6 Deep learning review, Chapter 0.0
7 Chapter 0.0, Chapter 0.1, guided discussion on different research areas within AI Safety and research directions outlined by Anthropic's alignment team
8 Chapter 0.1, outlining how evaluations are conducted, discussing METR's bug bounty program and desiradata, and compiling list of bugs that could be used as evals, Chapter 0.2
9 (Post-conference off week)
10 Chapter 0.2 and meta-tasks (Asana task list for remaining chapters to be completed and compiling list of papers to read
11 Chapter 0.2, Chapter 0.3, these blog posts from METR
12 Chapter 0.3, Chapter 0.4, read and gave a short presentation on Hinton et al. (2015), read and took notes on the RE-bench paper
13 Chapter 0.5, autoencoders and VAE math, Ying et al. (2025), Situational Awareness Dataset paper, Sleeper agents paper discussion
14 Chapter 3.1; brief discussions on Kingma & Ba (2017), We need a Science of Evals, & threat modeling
15 Chapter 3.1, exploring jobagent, notes on UMAP and t-SNE
16 Chapter 3.2, worked on minor extensions to the SAD paper, MATS applications

Planned timeline

Figure 1. Asana timeline for the completing the ARENA curriculum. The timeline was planned with one key assumption: that it takes ~4 hours to complete each subchapter when timed. We have 9 hours of programming meeting per week, which allows us to complete around two subchapters each week. Note that subchapters 1.2 - 1.5 have been omitted from this timeline as evaluations and reinforcement learning are larger priorities than mechanistic interpretability, but some portions from that chapter will be done after completing Chapter 2.4: RLHF after the 5th of May, 2025.

About

Pair programing through the Alignment Research Engineer Accelerator program

Resources

Stars

Watchers

Forks

Contributors 31