About The Project

Table of Contents

About The Project
- Built With
Getting Started
- Prerequisites
- Installation
Usage
Roadmap
Dataset
Contributing
License
Contact
Acknowledgments

About The Project

AI-powered search and chat for Dr. Andrew Huberman's Podcast.

All code & data used is 100% open-source.

Buit with

Dataset

The dataset provides comprehensive information for each video chapter, including timestamps, transcripts, and semantic embeddings.

Dataset Details

id: Unique identifier for each chapter entry.
video_title: Title of the video (e.g., Tony Hawk: Harnessing Passion, Drive & Persistence).
chapter_title: Title of each chapter within the video, providing an overview of the topic.
video_date: Date of the video release. Currently NULL.
video_id: Unique identifier for each video.
start_time and end_time: Timestamps indicating the beginning and end of each chapter.
conversation: JSON-encoded transcript segments for each chapter.
conversation_length: Length of the conversation in characters.
conversation_tokens: Approximate token count, useful for NLP tasks.
embedding: Vector embeddings for each chapter, representing semantic content.
summary: Text summary of each chapter. Currently NULL
summary_embedding: Embedding vector for each summary.

For a detailed structure of the dataset, including all fields and their types, please check the types/index.ts file.

Download

The dataset is available for download on Google Drive in CSV format .

To skip the data scraping and embedding steps, you can download the dataset directly and use it for analysis.

How It Works

Huberman GPT provides 2 things:

A search interface.
A chat interface.

Search

Search was created with OpenAI Embeddings (text-embedding-ada-002).

First, we loop over the transcription files, break each file down into chunks, and generate embeddings for each chunk of text.

Then in the app, we take the user's search query, generate an embedding, and use the result to find the most similar passages from the book.

The comparison is done using cosine similarity across our database of vectors.

Our database is a Postgres database with the pgvector extension hosted on Supabase.

Results are ranked by similarity score and returned to the user.

Chat

Chat builds on top of search. It uses search results to create a prompt that is fed into GPT-3.5-turbo.

This allows for a chat-like experience where the user can ask questions about the Podcast and get answers.

Running Locally

Here's a quick overview of how to run it locally.

Requirements

Set up OpenAI

You'll need an OpenAI API key to generate embeddings.

Set up Supabase and create a database

Note: You don't have to use Supabase. Use whatever method you prefer to store your data. But I like Supabase and think it's easy to use.

There is a schema.sql file in the root of the repo that you can use to set up the database.

Run that in the SQL editor in Supabase as directed.

I recommend turning on Row Level Security and setting up a service role to use with the app.

Repo Setup

Clone repo

git clone https://github.com/avsavani/hubermangpt

Install dependencies

npm i

Set up environment variables

Create a .env.local file in the root of the repo with the following variables:

NEXT_PUBLIC_OPENAI_API_KEY=
NEXT_PUBLIC_SUPABASE_URL=
NEXT_PUBLIC_SUPABASE_SERVICE_ROLE_KEY=

We have to prefix the variables with NEXT_PUBLIC_ so that they are available the edge functions on Vercel on deployment.

Dataset

Run scraping script

npm run scrape

You won't need to run this script as I have already scraped the data and uploaded it to Google Drive.

Run embedding script

npm run embed

This reads the json file, generates embeddings for each chunk of text, and saves the results to your database.

There is a 200ms delay between each request to avoid rate limiting.

This process will take 2 hours.

To pause the process use ctrl + c. To resume the process use npm run embed again. It will start from where it left off.

App

Run app

npm run dev

Credits

This code base is based on Mckay Wrigley's implementation of Paul Graham GPT.

Thanks to Dr. Andrew Huberman for putting out great podcast.

I highly recommend listening to the full podcasts from the results.

Contact

If you have any questions, feel free to reach out to me on Twitter!

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.idea		.idea
components		components
pages		pages
public		public
scripts		scripts
services		services
styles		styles
types		types
utils		utils
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
README.md		README.md
combined.log		combined.log
license		license
next.config.js		next.config.js
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
processedChapters.json		processedChapters.json
schema.sql		schema.sql
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About The Project

Buit with

Dataset

Dataset Details

Download

How It Works

Search

Chat

Running Locally

Requirements

Repo Setup

Dataset

App

Credits

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About The Project

Buit with

Dataset

Dataset Details

Download

How It Works

Search

Chat

Running Locally

Requirements

Repo Setup

Dataset

App

Credits

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages