🤖 NexusAI - AI-Powered Chat & Image Platform

NexusAI is a comprehensive AI platform that combines multiple AI capabilities into a single, user-friendly Streamlit dashboard. Built with Python and powered by Groq, OpenAI, Azure OpenAI, and Google Gemini, NexusAI offers chat, image analysis, image generation, text-to-speech, audio transcription, and document chat services.

🆕 New UI Features

The updated UI includes:

Home Page - Introduction and project overview with author information
API Key Setup - Dedicated page for configuring all API keys with instructions
Chat - Enhanced chat interface with Groq's LLMs
Image Analysis - Improved image analysis with multimodal models
Image Generation - Create images with OpenAI or Azure OpenAI DALL-E 3
Text-to-Speech - Convert text to natural speech with multiple voice options
Speech-to-Text - Transcribe audio with advanced Whisper models
Document Chat - Chat with your documents using Google Gemini
Thank You Page - Acknowledgment page with author information

✨ Features

💬 AI Chat: Engage in conversations with advanced LLMs like Llama 3
🖼️ Image Analysis: Upload and analyze images with multimodal AI models
🎨 Image Generation: Create stunning images with OpenAI or Azure OpenAI DALL-E 3
🗣️ Text-to-Speech: Convert text to natural-sounding speech
🎤 Audio Transcription: Transcribe audio files with Whisper models
📚 Document Chat: Chat with your documents using Google Gemini and ChromaDB
📱 Responsive UI: Clean, intuitive interface built with Streamlit

🚀 Getting Started

Prerequisites

Python 3.8 or higher
API keys for:
- Groq (for chat, image analysis, TTS, and transcription)
- OpenAI (for image generation with DALL-E 3)
- Azure OpenAI (for alternative image generation with DALL-E 3)
- Google Gemini (for document chat)

🔑 API Keys

NexusAI requires API keys to function. You can enter these directly in the application's API Setup page:

Groq API Key

Visit Groq's website and create an account
Navigate to the API section in your dashboard
Generate a new API key
Copy the key and add it to the application

OpenAI API Key

Visit OpenAI's website and create an account
Navigate to the API section in your dashboard
Generate a new API key
Copy the key and add it to the application

Azure OpenAI API Key

Create an Azure account if you don't have one
Set up an Azure OpenAI resource
Deploy a DALL-E 3 model
Get your API key, endpoint, and deployment name
Add these to the application

Google Gemini API Key

Visit Google AI Studio and create an account
Create a new API key
Copy the key and add it to the application

Installation

Clone the repository

git clone https://github.com/yourusername/NexusAI.git
cd NexusAI

Setup and Run (Linux/Mac)

# Make the run script executable
chmod +x run.sh

# Run the application
./run.sh

Setup and Run (Windows PowerShell)

# You may need to set execution policy
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser

# Run the application
.\run.ps1

The script will:

Create a virtual environment
Install required dependencies
Launch the Streamlit application

The application will be available at http://localhost:8501 in your web browser.

🧩 Features in Detail

AI Chat

Engage with advanced language models
Supports Llama 3 (8B and 70B) and Mixtral models
Adjustable parameters for temperature and token limits

Image Analysis

Upload images in various formats (JPG, PNG, WEBP)
Analyze images with multimodal AI models
Customizable analysis prompts

Image Generation

Generate images with OpenAI DALL-E 3 or Azure OpenAI DALL-E 3
Multiple size options (1024x1024, 1792x1024, 1024x1792)
Download generated images

Text-to-Speech

Convert text to natural-sounding speech
Multiple voice options
Various output formats (WAV, MP3, AAC, FLAC, PCM)

Audio Transcription

Transcribe audio files with Whisper models
Support for multiple languages
Word-level timestamps

Document Chat

Upload and process documents (PDF, TXT, CSV)
Chat with your documents using Google Gemini
Vector storage with ChromaDB
Source attribution for answers

📊 Application Structure

NexusAI is organized into a modular structure:

NexusAI/
├── main.py                  # Main application with navigation
├── chat_module.py           # Chat functionality
├── image_analysis_module.py # Image analysis functionality
├── image_generation_module.py # Image generation functionality
├── tts_module.py            # Text-to-speech functionality
├── stt_module.py            # Speech-to-text functionality
├── document_chat_module.py  # Document chat interface
├── document_chat.py         # Document chat backend
├── requirements.txt         # Python dependencies
├── run.sh                   # Linux/Mac launcher script
└── run.ps1                  # Windows PowerShell launcher script

🔧 Configuration

NexusAI supports various configuration options through model selection dropdowns and parameter sliders in the UI. Key configurable parameters include:

Chat models (Llama 3, Mixtral)
Image analysis models
Image generation providers (OpenAI or Azure OpenAI)
TTS voices and formats
Transcription models and languages
Temperature and other generation parameters
Document processing options

🛠️ Technical Details

NexusAI is built with:

Streamlit: For the web interface
Groq API: For chat, image analysis, TTS, and transcription
OpenAI API: For image generation with DALL-E 3
Azure OpenAI API: For alternative image generation with DALL-E 3
Google Gemini API: For document embeddings and chat
ChromaDB: For vector storage
LangChain: For document processing and retrieval
Python libraries: Including Pillow, requests, and dotenv

For a detailed overview of the system architecture, please see the ARCHITECTURE.md document.

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. Make sure to read the Code of Conduct and Contributing Guidelines.

👨‍💻 Authors

Amul Thantharate - An AI enthusiast passionate about creating innovative solutions. GitHub | LinkedIn
Avanti Nandanwar - A dedicated AI developer contributing to this project. GitHub | LinkedIn

🙏 Acknowledgements

Groq for their powerful AI APIs
OpenAI for DALL-E 3 integration
Microsoft Azure for Azure OpenAI services
Google Gemini for document embeddings and chat
Streamlit for the amazing web framework
ChromaDB for vector storage
LangChain for document processing

⭐ If you find this project useful, please consider giving it a star on GitHub!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🤖 NexusAI - AI-Powered Chat & Image Platform

🆕 New UI Features

✨ Features

🚀 Getting Started

Prerequisites

🔑 API Keys

Groq API Key

OpenAI API Key

Azure OpenAI API Key

Google Gemini API Key

Installation

🧩 Features in Detail

AI Chat

Image Analysis

Image Generation

Text-to-Speech

Audio Transcription

Document Chat

📊 Application Structure

🔧 Configuration

🛠️ Technical Details

📝 License

🤝 Contributing

👨‍💻 Authors

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.env.example		.env.example
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
chat_module.py		chat_module.py
document_chat.py		document_chat.py
document_chat_module.py		document_chat_module.py
image_analysis_module.py		image_analysis_module.py
image_generation_module.py		image_generation_module.py
main.py		main.py
requirements.txt		requirements.txt
run.ps1		run.ps1
run.sh		run.sh
stt_module.py		stt_module.py
tts_module.py		tts_module.py

License

Amul-Thantharate/NexusAI

Folders and files

Latest commit

History

Repository files navigation

🤖 NexusAI - AI-Powered Chat & Image Platform

🆕 New UI Features

✨ Features

🚀 Getting Started

Prerequisites

🔑 API Keys

Groq API Key

OpenAI API Key

Azure OpenAI API Key

Google Gemini API Key

Installation

🧩 Features in Detail

AI Chat

Image Analysis

Image Generation

Text-to-Speech

Audio Transcription

Document Chat

📊 Application Structure

🔧 Configuration

🛠️ Technical Details

📝 License

🤝 Contributing

👨‍💻 Authors

🙏 Acknowledgements

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages