This is a RAG chatbot designed for automated hotel guest support. It utilizes an Orchestrator Pattern to coordinate between Vector DB search and LLM, featuring a Human-in-the-Loop mechanism to handle low-confidence queries.
- Architecture Overview
- Quality Assurance & Testing
- RAGAS Evaluation
- Tech Stack
- Prerequisites
- Setup Instructions
- API Setup
- How It Works
- License
This application is built as an AI-powered RAG (Retrieval-Augmented Generation) system using a centralized Orchestrator to manage data flow and logic.
- Knowledge Base: Uses a structured
knowledge_base.jsonas the primary source of resort information. - Vector Storage: Text chunks are embedded and stored in a ChromaDB index for high-speed semantic similarity search.
- ChatManager: Acts as the "Brain" of the operation. It manages the lifecycle of a message:
- Triggers embedding of the user query.
- Queries the Vector DB for context.
- Evaluates the Confidence Score.
- Decides whether to answer directly or route to a human operator.
- Text Generation: Powered by Qwen 2.5 (7B Instruct) via the Hugging Face Router.
- OpenAI SDK: Used as a robust interface to interact with remote inference endpoints.
- Role-Play: Strict system prompt ensure the AI maintains a "Hotel Concierge" persona using corresponding identity.
- Threshold Logic: If the vector search returns a confidence score below the threshold, the system triggers a "pending approval" state.
- Operator Alerts: Designed to integrate with Telegram to allow hotel staff to review AI suggestions and intervene in real-time.
The project includes a comprehensive Automated Testing Framework to prevent hallucinations and maintain "Brand Voice":
- LLM-as-a-Judge: evaluates the assistant's performance across multiple categories.
- Groundedness (Faithfulness): Ensuring answers are strictly based on the provided context.
- Negative Constraints: Verifying the AI admits ignorance when information is missing instead of hallucinating.
- Relevancy & Completeness: Checking if all parts of a user query are addressed.
- Tone & Persona: Monitoring "Brand Voice" consistency (politeness).
- Vector Database: specialized tests to ensure the ChromaDB index and retrieval logic work with high precision:
- Semantic Retrieval Accuracy: Basic verification that the system retrieves the most relevant chunks for standard queries.
- Top-K Recall Optimization: Measuring if the "ground truth" information is consistently present within the top-K retrieved results.
- Metadata Filtering: Ensuring that search results can be correctly narrowed down using metadata tags without losing semantic relevance.
The project includes RAGAS (Retrieval-Augmented Generation Assessment) evaluation to measure the technical performance of our pipeline:
- Faithfulness: Measures the factual consistency of the generated answer against the retrieved context.
- Answer Relevance: Evaluates how well the answer addresses the user's specific query without redundant info.
- Context Precision: Calculates the signal-to-noise ratio in the retrieved chunks (how relevant the top-K results are).
- Context Recall: Checks if the retrieved context actually contains the ground-truth information needed to answer.
- Python – Core logic
- Flask – Web framework and API routing
- ChromaDB – Vector database for similarity search
- Hugging Face Hub – Native inference client for embeddings
- OpenAI Python SDK – Client for LLM interactions
- Telegram Bot API – Operator interface
- pytest – Testing engine
- RAGAS – RAG evaluation framework
- Python 3.10+
- Hugging Face account with an Access Token (Write/Inference permissions)
- Telegram Account: To create a bot and receive operator alerts via the Telegram Bot API.
- ngrok (or similar): Required for local development to expose your webhook to Telegram's servers.
git clone https://github.com/deedmitrij/chatbot-assistant.git
cd chatbot-assistantpython -m venv .venv
.venv\Scripts\activatepip install -r requirements.txtCreate a .env file in the project root and add the following:
# Telegram Configuration
TG_BOT_TOKEN=your_telegram_bot_token
TG_ADMIN_ID=your_telegram_chat_id
# Hugging Face Configuration
HF_API_TOKEN=your_huggingface_api_key
HF_BASE_URL=router_huggingface_url
# Model Selection
CHAT_MODEL=main_llm_model
EMBEDDING_MODEL=embedding_model📌 Note:
TG_BOT_TOKEN: Replace with the API token you received from @BotFather.TG_ADMIN_ID: Replace with your unique Telegram User ID (get it from @userinfobot).HF_API_TOKEN: Replace with your Hugging Face Access Token.HF_BASE_URL: Use the standard Hugging Face Inference API URL (https://router.huggingface.co/v1).CHAT_MODEL: Specify the model for text generation (e.g., Qwen/Qwen2.5-7B-Instruct).EMBEDDING_MODEL: Specify the model for embeddings (e.g., BAAI/bge-small-en-v1.5).
python main.pyThe chatbot will start and be accessible at http://localhost:5000.
To run the automated QA suite:
pytest .\tests\To receive "Low Confidence" alerts and respond to guests from your phone:
- Create a Bot:
- Message @BotFather on Telegram.
- Use the
/newbotcommand and follow the instructions. - Copy the API Token and add it to your
.envfile asTG_BOT_TOKEN.
- Get your Chat ID:
- Message @userinfobot.
- Copy your unique ID (a string of numbers) and add it to your
.envfile asTG_ADMIN_ID.
- Initialize the Bot:
- Open your new bot's chat and press Start. The bot cannot message you until you've interacted with it.
- Setup Webhook (Local Dev):
- Use a tool like ngrok to create a public URL for your local server:
ngrok http 5000. - Register the URL with Telegram:
https://api.telegram.org/bot<YOUR_TOKEN>/setWebhook?url=<YOUR_NGROK_URL>/webhook/telegram
- Use a tool like ngrok to create a public URL for your local server:
To get access to Hugging Face models:
- Visit https://huggingface.co/
- Sign in or create a Hugging Face account.
- Go to Settings → Access Tokens.
- Create a new token with Make calls to Inference Providers permission.
- Copy the token and add it to your
.envfile asHF_API_KEY.
- User Input: A guest interacts with the chat by either selecting a predefined FAQ category or typing a natural language question into the interface.
- Answer Search: The orchestrator transforms the query into a vector and performs a similarity search against the vector DB index to extract the most relevant policy or fact from the knowledge base.
- Response: The system evaluates the proximity of the found data:
- Confidence High: The system sends the question + context to LLM to generate a polite, branded response.
- Confidence Low: The guest receives a bridging message, while the system alerts a human operator to review the AI-generated suggestion or provide a manual response.
This project is open-source and available under the MIT License.