SteveAI is a project focused on performing model distillation, initially targeting the compression of large language models. This repository currently contains the first stage: Teacher Model Inference.
The goal is to make model distillation feasible even with limited computational resources (like free tiers on Kaggle or Colab) by breaking the process into stages.
- Teacher Model:
deepseek-ai/deepseek-coder-6.7b-instruct - Student Model (Target):
deepseek-ai/deepseek-coder-1.3b-instruct
Current Stage: This code performs inference using the large teacher model (6.7B) on a dataset and saves the resulting logits. This is computationally intensive and is designed to run efficiently on a GPU environment like Kaggle.
Motivation: Running inference for large models requires significant RAM and computational power. Free CPU environments often lack sufficient RAM. This script leverages free GPU resources (like Kaggle Kernels) to generate the necessary teacher outputs (logits), which can then be used in a less resource-intensive environment to train the smaller student model.
- A Kaggle account (for using free GPU resources).
- Basic understanding of Python, PyTorch, and Transformers.
- Familiarity with Kaggle Notebooks.
-
Clone or Download: Get the project files onto your local machine or directly upload them to Kaggle.
git clone https://github.com/your-username/SteveAI.git cd SteveAI(Replace
your-usernamewith your actual GitHub username after creating the repository) -
Kaggle Notebook Setup:
- Create a new Kaggle Notebook.
- Upload: Upload the
teacher_inference_gpu.pyscript and therequirements.txtfile to your notebook environment. - Accelerator: In the Notebook settings (right panel), select a GPU accelerator (e.g., T4 x2 or P100).
- Internet: Ensure the Internet toggle is set to ON.
-
Install Dependencies: In a code cell in your Kaggle Notebook, run:
!pip install -r requirements.txt -q
-
Run the Inference Script: In another code cell, execute the main script:
!python teacher_inference_gpu.py
(Alternatively, you can paste the content of
teacher_inference_gpu.pydirectly into a notebook cell and run it.) -
Configuration: You can modify the parameters in the
Configclass at the top ofteacher_inference_gpu.pyor use theconfig.yamlfile:TEACHER_MODEL_ID: The Hugging Face ID of the teacher model.TOKENIZER_PATH: Path/ID for the tokenizer (usually same as teacher).DATASET_ID: Hugging Face dataset ID (e.g.,yahma/alpaca-cleaned).DATASET_SUBSET_SIZE: Number of samples to process (e.g.,500). Set toNonefor the full dataset (be mindful of time limits).MAX_SEQ_LENGTH: Max token sequence length (e.g.,256). Higher values need more VRAM.TEACHER_INFERENCE_BATCH_SIZE: Number of samples per batch (e.g.,4). Adjust based on GPU VRAM (T4 might need 2 or 4, V100+ could handle 8+).- Output directory is automatically detected based on environment (Kaggle vs local).
-
Output: The script will generate and save teacher logits as
.ptfiles in the specified output directory (e.g.,/kaggle/working/Kaggle_Distillation_GPU_Teacher_Inference/teacher_logits_float16/). Progress and memory usage will be logged. -
Save Results: Once the script finishes:
- Click "Save Version" in the top-right of the Kaggle editor.
- Select "Save & Run All (Commit)".
- Wait for the commit to complete. The output files will be saved as a Kaggle Dataset linked to this notebook version. You can then add this output dataset as input to a new notebook for the student training phase.
- Develop the second script (
student_training.py- TBD) which will load the pre-computed teacher logits and train the student model. This stage might be runnable on CPU or GPU depending on the student model size and training configuration.
This project is licensed under the Apache License 2.0. See the LICENSE file for details.