This repo contains our experiments in researching and implementing alternatives to Attention mechanism ie. MAMBA and xLSTM.
Note: Running Training and Inference requires CUDA installation. (nvcc and other dependencies)
The steps to run this project are -
The project uses Anaconda to create programming environment
conda create --name <env> --file requirements.txt- Models supported:
attention,mamba,xlstm - Context can be any string
python demo.py --model <model_name> -c "Shakespeare likes attention"We are using Weights & Biases library (W&B) for tracking training metrics (quickstart). To use W&B, setup the WANDB_API_KEY
export WANDB_API_KEY = <Your WandB api key>The testing files for each model are: gpt_test.py, mamba_test.py, xlstm_test.py