A Retrieval-Augmented Generation system that learns and mimics personal communication styles from chat history.
graph TD
subgraph Data Ingestion
CH[Chat History] --> Parser[Chat Parser]
Parser --> QA[Q&A Pairs]
QA --> VE[Vector Embeddings]
end
subgraph Vector Storage
VE --> VS[(Chroma DB)]
VS --> IC[Index Collection]
IC --> MT[Metadata Tags]
end
subgraph Persona Analysis
QA --> PA[Personality Analyzer]
PA --> PT[Traits Extraction]
PA --> PS[Style Analysis]
PA --> PP[Phrase Patterns]
end
subgraph RAG Engine
UQ[User Query] --> SR[Semantic Retrieval]
SR --> VS
VS --> RC[Relevant Context]
RC --> RG[Response Generator]
PT --> RG
PS --> RG
PP --> RG
end
subgraph Response Pipeline
RG --> RM[Response Matching]
RM --> EX[Exact Match]
RM --> SM[Similar Match]
EX --> FR[Final Response]
SM --> FR
end
🧠 Intelligent Persona Analysis
- Extracts personality traits
- Learns communication style
- Identifies common phrases
- Maps topic interests
🎯 Context-Aware Responses
- RAG-powered generation
- Historical context matching
- Style-consistent replies
- Exact response matching
Prerequisites
python -m venv venv
.\venv\Scripts\activate
pip install -r requirements.txt
Setup
- Set OpenAI API key
set OPENAI_API_KEY=your-api-key
- Run application
python app.py
- Access interface
- Open browser to http://localhost:5000
- Upload chat history file
- Start conversation
- Frontend: Bootstrap 5, Socket.IO
- Backend: Flask, Flask-SocketIO
- AI/ML: LangChain, OpenAI GPT-4
- Vector Store: ChromaDB
- Embeddings: OpenAI Ada
Open_Source_AI_Hackathon/
├── app.py # Flask server
├── persona_chatbot.py # Core RAG logic
├── templates/ # Frontend
│ └── index.html # Chat interface
├── requirements.txt # Dependencies
└── README.md # Documentation
MIT License - See LICENSE for details
Developed for Open Source AI Hackathon 2024 by Data Doppelgangers (GitHub: @ivanye2509, @prabhatmenon, @patelmanan96, @solkit70)
- OpenAI
- LangChain Framework
- Flask Community
- Kaggle CommunityGit