GitHub - AsteroidHunter/solvingARENA: Pair programing through the Alignment Research Engineer Accelerator program

This is my copy of the Alignment Research Engineer Accelerator (ARENA) program. The github repository for the original curriculum can be found here.

Progress Log

The table below lists tasks that are done during the ~12 hours co-working sessions that we host each week. Most weeks, 70% of the time is spent on the ARENA subchapters (or occasionally, independent project work), while the remaining time is spent on paper readings & discussions or conceptual reviews.

Week of 2025	Progress Updates
5	Linear Algebra review (rank, singular value decomposition, eigenv*s)
6	Deep learning review, Chapter 0.0
7	Chapter 0.0, Chapter 0.1, guided discussion on different research areas within AI Safety and research directions outlined by Anthropic's alignment team
8	Chapter 0.1, outlining how evaluations are conducted, discussing METR's bug bounty program and desiradata, and compiling list of bugs that could be used as evals, Chapter 0.2
9	(Post-conference off week)
10	Chapter 0.2 and meta-tasks (Asana task list for remaining chapters to be completed and compiling list of papers to read
11	Chapter 0.2, Chapter 0.3, these blog posts from METR
12	Chapter 0.3, Chapter 0.4, read and gave a short presentation on Hinton et al. (2015), read and took notes on the RE-bench paper
13	Chapter 0.5, autoencoders and VAE math, Ying et al. (2025), Situational Awareness Dataset paper, Sleeper agents paper discussion
14	Chapter 3.1; brief discussions on Kingma & Ba (2017), We need a Science of Evals, & threat modeling
15	Chapter 3.1, exploring jobagent, notes on UMAP and t-SNE
16	Chapter 3.2, worked on minor extensions to the SAD paper, MATS applications

Planned timeline

Figure 1. Asana timeline for the completing the ARENA curriculum. The timeline was planned with one key assumption: that it takes ~4 hours to complete each subchapter when timed. We have 9 hours of programming meeting per week, which allows us to complete around two subchapters each week. Note that subchapters 1.2 - 1.5 have been omitted from this timeline as evaluations and reinforcement learning are larger priorities than mechanistic interpretability, but some portions from that chapter will be done after completing Chapter 2.4: RLHF after the 5th of May, 2025.

Name		Name	Last commit message	Last commit date
Latest commit History 790 Commits
.streamlit		.streamlit
chapter0_fundamentals		chapter0_fundamentals
chapter1_transformer_interp		chapter1_transformer_interp
chapter2_rl		chapter2_rl
chapter3_llm_evals		chapter3_llm_evals
images		images
infrastructure/master_files		infrastructure/master_files
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
st_dependencies.py		st_dependencies.py
style.css		style.css

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Progress Log

Planned timeline

About

Uh oh!

Uh oh!

Contributors 31

Uh oh!

Languages

AsteroidHunter/solvingARENA

Folders and files

Latest commit

History

Repository files navigation

Progress Log

Planned timeline

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 31

Uh oh!

Languages