SharkTankAI

This project aims to fine-tune a language model on Shark Tank pitch transcripts to generate new, creative business pitches.

Project Structure

data-generation/: Contains scripts for collecting YouTube video URLs, transcribing them, and preparing the data for fine-tuning.
fine-tuning/: Contains scripts for fine-tuning a language model (TinyLlama) using PEFT (LoRA) and merging the LoRA adapters with the base model.
transcripts/: Stores the transcribed text from YouTube videos.
lora-llama/: Contains the checkpoints from the fine-tuning process.
merged_model/: Stores the final merged language model.

Setup

Clone the repository:

git clone https://github.com/your-username/SharkTankAI.git
cd SharkTankAI

Create a virtual environment and install dependencies:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Usage

1. Data Collection and Preparation

First, collect video URLs, transcribe them, and generate the JSON dataset:

python data-generation/collect_videos.py
python data-generation/transcribe_videos.py
python data-generation/generate_json.py

This will:

Search YouTube for 'Shark Tank' videos and save URLs to video_urls.txt.
Transcribe the videos and save transcripts to the transcripts/ directory.
Combine all transcripts into shark_tank_pitches.json.

2. Fine-tuning the Model

Once the data is prepared, you can fine-tune the TinyLlama model:

python fine-tuning/fine_tune.py

This script will save the fine-tuned LoRA adapters in the lora-llama/ directory.

3. Merging the LoRA Adapters

After fine-tuning, merge the LoRA adapters with the base model to create a standalone model:

python fine-tuning/merge_lora.py

The merged model will be saved in the merged_model/ directory.

Model Details

The project uses TinyLlama/TinyLlama-1.1B-Chat-v1.0 as the base model for fine-tuning. The model has been trained on Shark Tank transcripts to enable it to act as a 'negator model', providing critical analysis of business pitches. The fine-tuned model is then converted to GGUF format and can be merged with Ollama for local inference.

Future Improvements

Implement a script to generate new pitches using the merged model.
Explore different base models and fine-tuning parameters.
Add more robust error handling and logging.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data-generation		data-generation
fine-tuning		fine-tuning
llama.cpp		llama.cpp
lora-llama/checkpoint-10		lora-llama/checkpoint-10
merged_model		merged_model
transcripts		transcripts
.DS_Store		.DS_Store
.gitattributes		.gitattributes
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SharkTankAI

Project Structure

Setup

Usage

1. Data Collection and Preparation

2. Fine-tuning the Model

3. Merging the LoRA Adapters

Model Details

Future Improvements

About

Uh oh!

Releases

Packages

Languages

OGODEVO/Shark-tank-AI

Folders and files

Latest commit

History

Repository files navigation

SharkTankAI

Project Structure

Setup

Usage

1. Data Collection and Preparation

2. Fine-tuning the Model

3. Merging the LoRA Adapters

Model Details

Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages