NexusAI is a comprehensive AI platform that combines multiple AI capabilities into a single, user-friendly Streamlit dashboard. Built with Python and powered by Groq, OpenAI, Azure OpenAI, and Google Gemini, NexusAI offers chat, image analysis, image generation, text-to-speech, audio transcription, and document chat services.
The updated UI includes:
- Home Page - Introduction and project overview with author information
- API Key Setup - Dedicated page for configuring all API keys with instructions
- Chat - Enhanced chat interface with Groq's LLMs
- Image Analysis - Improved image analysis with multimodal models
- Image Generation - Create images with OpenAI or Azure OpenAI DALL-E 3
- Text-to-Speech - Convert text to natural speech with multiple voice options
- Speech-to-Text - Transcribe audio with advanced Whisper models
- Document Chat - Chat with your documents using Google Gemini
- Thank You Page - Acknowledgment page with author information
- π¬ AI Chat: Engage in conversations with advanced LLMs like Llama 3
- πΌοΈ Image Analysis: Upload and analyze images with multimodal AI models
- π¨ Image Generation: Create stunning images with OpenAI or Azure OpenAI DALL-E 3
- π£οΈ Text-to-Speech: Convert text to natural-sounding speech
- π€ Audio Transcription: Transcribe audio files with Whisper models
- π Document Chat: Chat with your documents using Google Gemini and ChromaDB
- π± Responsive UI: Clean, intuitive interface built with Streamlit
- Python 3.8 or higher
- API keys for:
- Groq (for chat, image analysis, TTS, and transcription)
- OpenAI (for image generation with DALL-E 3)
- Azure OpenAI (for alternative image generation with DALL-E 3)
- Google Gemini (for document chat)
NexusAI requires API keys to function. You can enter these directly in the application's API Setup page:
- Visit Groq's website and create an account
- Navigate to the API section in your dashboard
- Generate a new API key
- Copy the key and add it to the application
- Visit OpenAI's website and create an account
- Navigate to the API section in your dashboard
- Generate a new API key
- Copy the key and add it to the application
- Create an Azure account if you don't have one
- Set up an Azure OpenAI resource
- Deploy a DALL-E 3 model
- Get your API key, endpoint, and deployment name
- Add these to the application
- Visit Google AI Studio and create an account
- Create a new API key
- Copy the key and add it to the application
- Clone the repository
git clone https://github.com/yourusername/NexusAI.git
cd NexusAI- Setup and Run (Linux/Mac)
# Make the run script executable
chmod +x run.sh
# Run the application
./run.sh- Setup and Run (Windows PowerShell)
# You may need to set execution policy
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
# Run the application
.\run.ps1The script will:
- Create a virtual environment
- Install required dependencies
- Launch the Streamlit application
The application will be available at http://localhost:8501 in your web browser.
- Engage with advanced language models
- Supports Llama 3 (8B and 70B) and Mixtral models
- Adjustable parameters for temperature and token limits
- Upload images in various formats (JPG, PNG, WEBP)
- Analyze images with multimodal AI models
- Customizable analysis prompts
- Generate images with OpenAI DALL-E 3 or Azure OpenAI DALL-E 3
- Multiple size options (1024x1024, 1792x1024, 1024x1792)
- Download generated images
- Convert text to natural-sounding speech
- Multiple voice options
- Various output formats (WAV, MP3, AAC, FLAC, PCM)
- Transcribe audio files with Whisper models
- Support for multiple languages
- Word-level timestamps
- Upload and process documents (PDF, TXT, CSV)
- Chat with your documents using Google Gemini
- Vector storage with ChromaDB
- Source attribution for answers
NexusAI is organized into a modular structure:
NexusAI/
βββ main.py # Main application with navigation
βββ chat_module.py # Chat functionality
βββ image_analysis_module.py # Image analysis functionality
βββ image_generation_module.py # Image generation functionality
βββ tts_module.py # Text-to-speech functionality
βββ stt_module.py # Speech-to-text functionality
βββ document_chat_module.py # Document chat interface
βββ document_chat.py # Document chat backend
βββ requirements.txt # Python dependencies
βββ run.sh # Linux/Mac launcher script
βββ run.ps1 # Windows PowerShell launcher script
NexusAI supports various configuration options through model selection dropdowns and parameter sliders in the UI. Key configurable parameters include:
- Chat models (Llama 3, Mixtral)
- Image analysis models
- Image generation providers (OpenAI or Azure OpenAI)
- TTS voices and formats
- Transcription models and languages
- Temperature and other generation parameters
- Document processing options
NexusAI is built with:
- Streamlit: For the web interface
- Groq API: For chat, image analysis, TTS, and transcription
- OpenAI API: For image generation with DALL-E 3
- Azure OpenAI API: For alternative image generation with DALL-E 3
- Google Gemini API: For document embeddings and chat
- ChromaDB: For vector storage
- LangChain: For document processing and retrieval
- Python libraries: Including Pillow, requests, and dotenv
For a detailed overview of the system architecture, please see the ARCHITECTURE.md document.
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request. Make sure to read the Code of Conduct and Contributing Guidelines.
- Amul Thantharate - An AI enthusiast passionate about creating innovative solutions. GitHub | LinkedIn
- Avanti Nandanwar - A dedicated AI developer contributing to this project. GitHub | LinkedIn
- Groq for their powerful AI APIs
- OpenAI for DALL-E 3 integration
- Microsoft Azure for Azure OpenAI services
- Google Gemini for document embeddings and chat
- Streamlit for the amazing web framework
- ChromaDB for vector storage
- LangChain for document processing
β If you find this project useful, please consider giving it a star on GitHub!