A RAG (Retrieval-Augmented Generation) chatbot designed to answer questions about MLFF_QD, configurations, and workflows.
- Hybrid Search: Combines ChromaDB (Vector) and BM25 (Keyword) for robust retrieval.
- Re-Ranking: Uses FlashRank to re-order retrieved documents for higher relevance.
- LLM Integration: Powered by Groq (Llama 3) for fast and accurate responses.
- Interactive UI: Built with Chainlit for a chat-like experience.
- Sources Citation: Displays the specific files used to generate the answer.
data/ # Place your source documents here
│ manual_user_guide.txt
│ paper.docx
│ qa_pairs_new.txt
app_chainlitkeywordRank.py # Main application logic
config.py # Configuration settings
.env # Environment variables (API Keys)
requirements.txt # Python dependencies
- Anaconda or Miniconda
- Git
git clone https://github.com/nlesc-nano/MLFF_QD_ChatBot
cd MLFF_QD_ChatBotconda create -n mlff_bot python=3.11 -y
conda activate mlff_botpip install -r requirements.txt- Create a file named
.envin the root directory. - Add your Groq API Key:
GROQ_API_KEY=gsk_your_actual_api_key_hereEnsure your data files are in the data/ folder. The app expects:
manual_user_guide.txtqa_pairs_new.txtpaper.docx
(You can change these filenames in config.py if needed.)
The current code supports .txt and .docx files only.
If you want to use other file formats (e.g., PDF, CSV), you must modify the setup_retriever
function inside app_chainlitkeywordRank.py to include the appropriate document loader.
You can find the list of supported loaders here:
👉 LangChain Document Loaders Documentation
Run the Chainlit application:
chainlit run app_chainlitkeywordRank.py -wThe -w flag enables auto-reloading when you change code.