This project fetches web content from a given URL, embeds the content using Hugging Face's sentence-transformers/all-mpnet-base-v2 model, and stores the embeddings in a FAISS vector database. Users can interact with the system to summarize the content based on their prompts.
- Fetches web content using
UnstructuredURLLoader. - Embeds content using the
sentence-transformers/all-mpnet-base-v2model. - Stores embeddings in a FAISS vector database for efficient similarity search.
- Summarizes content based on user-provided prompts.
- Python 3.8 or above
- pip package manager
Install the required dependencies:
pip install sentence-transformers faiss-cpu unstructuredgit clone https://github.com/yourusername/web-content-summarizer.git
cd web-content-summarizerEnsure the necessary libraries are installed:
pip install -r requirements.txtProvide the URL of the content you want to summarize. Example usage:
from unstructured.ingest.loader import UnstructuredURLLoader
from sentence_transformers import SentenceTransformer
import faiss
# Load content from URL
url_loader = UnstructuredURLLoader(urls=["https://example.com"])
content = url_loader.load()
# Embed content
model = SentenceTransformer("sentence-transformers/all-mpnet-base-v2")
embeddings = model.encode(content)
# Store in FAISS vector store
dimension = embeddings.shape[1]
faiss_index = faiss.IndexFlatL2(dimension)
faiss_index.add(embeddings)Use user prompts to query the FAISS vector store and summarize the content:
def summarize(prompt, faiss_index, model, content):
# Encode the prompt
query_embedding = model.encode([prompt])
# Search in FAISS index
distances, indices = faiss_index.search(query_embedding, k=5)
# Retrieve relevant content
relevant_content = [content[idx] for idx in indices[0]]
summary = " ".join(relevant_content)
return summary
# Example prompt
prompt = "Summarize the main points of the article"
summary = summarize(prompt, faiss_index, model, content)
print("Summary:", summary)web-content-summarizer/
├── README.md # Project documentation
├── requirements.txt # List of dependencies
├── main.py # Main script for summarization
├── utils/ # Utility functions
└── tests/ # Unit tests
- Hugging Face for the
sentence-transformersmodel. - FAISS for the vector database.
- Unstructured for the content loader.
This project is licensed under the MIT License. See the LICENSE file for details.
Contributions are welcome! Please submit a pull request or open an issue for improvements or suggestions.
For inquiries or support, reach out to your_email@example.com.