Using a multi-armed bandit problem to simulate the content creation and recommendation space.
Content on recommendation platforms (Youtube, TikTok, etc.) is quickly dominating the attention and pastimes of young adults and children. Creators are constantly faced with the dilemma of what content to produce to best capture views on their platform. Their behavior as a collective and its effect on the user experience are still unanswered questions. This project attempts to construct a simulation for this problem. We found that the EXP3 adversarial bandit algorithm performs significantly better than other bandit algorithms.
Exploration of the simulation is covered in two Jupyter notebooks, a poster, and a paper. The poster has the most in-depth coverage and explanation of the initial model; exploring different ratios of creators, content types, and users/consumers; implementing noisy rewards; softmax vs hard max recommendation by the system, creator content choice using EXP3 vs UCB1; and subscription-based recommendation. All work is shown in the two notebooks where the former four explorations are in the first notebook and the latter two explorations are in the second notebook.
Pawan Jayakumar, Alan Zheng