This repository demonstrates how to generate text embeddings using OpenAI models through LangChain, and then visualize the semantic similarity between words in 3D space using PCA and Matplotlib.
- Generate embeddings for arbitrary text using OpenAI’s embedding models.
- Support for multiple models:
text-embedding-3-largetext-embedding-3-smalltext-embedding-ada-002(default)
- Dimensionality reduction using PCA.
- Interactive 3D scatter plot visualization of embeddings.
Install dependencies with:
pip install -r requirements.txtrequirements.txt
langchain-openai
python-dotenv
numpy
matplotlib
scikit-learnYou will also need an OpenAI API key.
-
Clone this repository:
git clone https://github.com/your-username/embedding-visualizer.git cd embedding-visualizer -
Create a
.envfile in the root directory and add your OpenAI API key:echo "OPENAI_API_KEY=your_api_key_here" > .env
-
Choose your embedding model by editing the
EMBEDDING_MODELvariable in the script:EMBEDDING_MODEL="text-embedding-ada-002"
Run the script to generate embeddings and plot them:
python embeddings_plot.pyThis will:
- Generate embeddings for the hardcoded list of words:
texts = ["nfl", "football", "soccer", "basketball", "baseball"]
- Reduce them to 3D space using PCA.
- Save the visualization to
3d_plot_small.png.
Example output:
📊 A 3D scatter plot showing the relative similarity of sports terms.
.
├── embeddings_plot.py # Main script
├── requirements.txt # Dependencies
└── .env # API key (not committed)
- To change the words being compared, edit the
textslist inembeddings_plot.py. - To try a different embedding model, set
EMBEDDING_MODELaccordingly. - To adjust plot resolution, modify the
dpiparameter in:plt.savefig("3d_plot_small.png", dpi=1000, bbox_inches='tight')
