Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 22 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,25 @@
# 🧪 Advanced AI on Arm
# Advanced AI on Arm

This course provides a hands-on introduction to *extreme model quantization*, *hardware-aware optimization*, and *on-device deployment* for generative AI models. You'll explore advanced techniques to reduce model size, accelerate inference, and deploy compact LLMs on edge devices like Android smartphones.

## 🧬 Labs Overview
## Labs Overview

### 🔹 Lab 1: **Extreme Quantization**
Train a language model and progressively quantize it from FP32 to 8-bit, 4-bit, 2-bit, and 1-bit precision. Implement and evaluate **Quantization-Aware Training (QAT)** to mitigate accuracy degradation in ultra-low-bit models.
### Lab 1: **Extreme Quantization**
Train a language model and progressively quantize it from FP32 to 8-bit, 4-bit, 2-bit, and 1-bit precision. Implement and evaluate **quantization aware training (QAT)** to mitigate accuracy degradation in ultra-low-bit models.

### 🔹 Lab 2: **Hardware–Software Model Co-Design**
Wrap all `nn.Linear` layers with a custom `QLinear` module and explore **layerwise post-training quantization**. Search for the optimal bit-width configuration to maximize efficiency while maintaining model fidelity in a software-hardware co-design process.
### Lab 2: **Hardware–Software Model Co-Design**
Wrap all `nn.Linear` layers with a custom `QLinear` module and explore **layerwise post-training quantization**. Search for the optimal bit-width configuration to maximize efficiency while maintaining model fidelity in a hardware-software co-design process.

### 🔹 Lab 3: **Running & Quantizing Models on Android**
Use [`llama.cpp`](https://github.com/ggerganov/llama.cpp) to quantize and deploy LLaMA-style LLMs on Android. Learn how to benchmark and run models *offline*, directly on your mobile hardware.
### Lab 3: **Running & Quantizing Models on Android**
Use [`llama.cpp`](https://github.com/ggerganov/llama.cpp) to quantize and deploy Llama-style LLMs on Android. Learn how to benchmark and run models *offline*, directly on your mobile hardware.

---

## 🚀 Getting Started
## Getting Started

This repository uses a unified `requirements.txt` and Git LFS to manage dependencies and large pretrained models.

### 1️⃣ Clone the Repository and Download Model Weights
### 1️. Clone the Repository and Download Model Weights

```bash
# Install Git LFS if needed
Expand All @@ -32,29 +32,29 @@ cd Advanced-AI-on-Arm
git lfs pull
```

### 2️⃣ Set Up the Python Environment
### 2️. Set Up the Python Environment

```bash
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```

### 3️⃣ Run the Labs
### 3️. Run the Labs

```bash
jupyter lab
```

Open:

- `lab1.ipynb` for **Extreme Quantization**
- `lab2.ipynb` for **Hardware–Software Co-Design**
- Follow `lab3.md` for **Android deployment** with `llama.cpp`
- `lab1.ipynb` for **Extreme Quantization**;
- `lab2.ipynb` for **Hardware–Software Co-Design**; and
- Follow `lab3.md` for **Android deployment** with `llama.cpp`.

---

## 📁 Repository Structure
## Repository Structure

```
Advanced-AI-on-Arm/
Expand All @@ -69,19 +69,19 @@ Advanced-AI-on-Arm/

---

## 📱 Android Deployment Notes
## Android Deployment Notes

To complete **Lab 3**, make sure the following are installed:

- Android Studio (Hedgehog or later)
- Android NDK + ADB
- A physical Android 10+ device with ≥6GB RAM
- Android Studio (Hedgehog or later);
- Android NDK + ADB; and
- a physical Android 10+ device with ≥6GB RAM.

> Windows users: use **WSL 2** with Ubuntu 22.04 for full compatibility with build tools.

---

## 🧠 Learning Outcomes
## Learning Outcomes

- Understand bit-width trade-offs (accuracy vs. compression)
- Apply QAT to recover performance in quantized models
Expand All @@ -90,7 +90,7 @@ To complete **Lab 3**, make sure the following are installed:

---

## 📫 Questions?
## Questions?

Open an issue or contact `oliver@grainge.me` if you encounter problems during setup or execution.

Expand Down
Loading