A monthly reading group focused on the latest research in post-training techniques for large language models, including RFT, RLHF, preference learning, synthetic preference data, and related topics.
- [Aug 7, 2025] Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy (Liu et al., 2025)
- Frequency: Monthly
- Location: Collinear HQ in Mountain View, CA
- Format: In-person discussion of selected papers
- Duration: 1-2 hours per session
Our reading group covers various aspects of post-training research:
- Reinforcement Learning from Human Feedback (RLHF)
- Direct Preference Optimization (DPO) and variants
- Preference learning and reward modeling
- Alignment and safety techniques
- Evaluation and benchmarking
- Human-AI collaboration in preference data
- Sign up for our mailing list
- Show up to our events
- Suggest papers for discussion
- Co-organize sessions
For questions about the reading group or to suggest papers, please open an issue in this repository or contact research@collinear.ai
Last updated: July 2025