An experimental platform for technical users to explore various AI concepts and tools, featuring a comprehensive suite of labs including Model Playground, Web Research, Data Processing, Knowledge Base management, Embeddings generation, and Retrieval-Augmented Generation (RAG) pipelines.
-
Model Playground: Interactive environment for:
- Text generation and classification
- Named entity recognition
- Document summarization
- Multi-language translation
-
Web Research Lab: Automated research assistant with:
- Web content crawling and analysis
- Multi-source information synthesis
- Citation generation with credibility scoring
- Source evaluation and bias detection
-
Data Processing Lab: Advanced text processing pipeline for:
- Document cleaning and normalization
- Semantic document chunking
- Format conversion with metadata preservation
- Automated attribute extraction
-
Knowledge Base Lab: Document management system featuring:
- Version-controlled storage
- Automated content categorization
- Semantic search capabilities
- Custom taxonomies and cross-referencing
-
Embeddings Lab: Vector representation workspace offering:
- Multiple embedding model options
- Interactive visualization tools
- Vector database management
- Index optimization and monitoring
-
RAG Pipeline: End-to-end system combining:
- Semantic search and context-aware retrieval
- Optimized embedding generation
- LLM integration for response generation
- Configurable pre/post-processing
- Python 3.10 or 3.11
- OpenAI API key
- Virtual environment (recommended)
- Clone the repository:
git clone https://github.com/DavoCoder/ai-lab.git
cd ai-lab- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Unix/macOS
# or
venv\Scripts\activate # On Windows- Install dependencies:
pip install -r requirements.txt- Set up environment variables:
- Create a .env file in the root folder
- Use the .env_example file to get the environment variables needed
- Fill out the environment variables with your local paths
| Environment Variable | Description | Supported Files |
|---|---|---|
| CHROMA_PERSIST_DIR_PATH | Local directory path to create the ChromaDB | - |
| KNOWLEDGE_ARTICLES_DIR_PATH | Local directory path to get Documents for creating embeddings for the ChromaDB | .txt |
| METADATA_FILE_PATH | Local file path to store hash maps for identifying changes in the knowledge base article dir and files | - |
streamlit run app.pyuvicorn rest_api.rag_processor_api:app --reloadAccess the API documentation at http://localhost:8000/docs
ai-lab/
├── config/ # Configuration
├── data_processing/ # Data processing
├── embeddings/ # Embedding models
├── file_handler/ # File handling
├── knowledge_base/ # Document processing
├── nlp_processing/ # NLP processing models
├── query_pre_processing/ # Query enhancement
├── rag/ # RAG processing
├── response_post_processing/ # Post-processing of the response
├── rest_api/ # REST API (FastAPI)
├── retrival_optimization/ # Optimization of the retrival
├── toxicity_detection/ # Toxicity detection
├── ui/ # UI
├── vector_databases/ # Vector storage
├── web_research/ # Web research
├── app.py # Streamlit frontend
├── embeddings_generation.py # Embeddings generation
└── requirements.txt # Project dependencies
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Open a Pull Request
Apache License, Version 2.0
- OpenAI for LLM support
- LangChain for the framework
DavoCoder
Built by DavoCoder