AI-powered platform for Ministry of Education (MoE) and Higher-Education institutions to retrieve, understand, compare, explain, and audit government policies.
This project uses a phase-based documentation system for better organization:
- README.md (this file) - Quick start and overview
- PROJECT_DESCRIPTION.md - Comprehensive technical documentation
-
PHASE_1_SETUP_AND_AUTHENTICATION.md (7 documents)
- Email verification system
- Two-step registration
- University email domain validation
- Authentication setup guides
-
PHASE_2_DOCUMENT_MANAGEMENT.md (15 documents)
- Document approval workflows
- Draft and review processes
- Access control and security
- Status visibility and badges
- Search and sorting features
-
PHASE_3_INSTITUTION_AND_ROLE_MANAGEMENT.md (22 documents)
- Institution hierarchy management
- Ministry and university relationships
- Role-based permissions
- Institution deletion workflows
- User management strategies
-
PHASE_4_ADVANCED_FEATURES_AND_OPTIMIZATIONS.md (61 documents)
- Chat system and voice queries
- Notification system
- RAG and vector store optimizations
- Performance improvements (Redis, caching, indexing)
- External data sources
- Analytics and insights
- UI/UX fixes and enhancements
- Security audits and fixes
- π§ Gemini 2.5 Flash: Latest Google AI for advanced reasoning and policy analysis
- π€ Voice Queries: Google Speech-to-Text API supporting 98+ languages
- ποΈ Smart OCR: Google Cloud Vision API for text extraction from images and PDFs
- π Multilingual: 100+ languages including Hindi, Tamil, Telugu, Bengali
- π Policy Analysis: Compare documents, detect conflicts, check compliance
- π Contextual Search: Understand intent and provide relevant answers
- π Multi-format Support: PDF, DOCX, PPTX, Images (with Google OCR)
- π Hybrid Search: Semantic + keyword search with intelligent ranking
- β‘ Lazy RAG: Instant uploads, on-demand embedding for faster processing
- π Citation Tracking: All AI answers include source documents with page numbers
- π Role-Based Access: Hierarchical document visibility and permissions
- π Document Families: Group related documents for better organization
- ποΈ Role Hierarchy: Developer β Ministry Admin β University Admin β Document Officer β Student
- π’ Institution Types: Universities, Hospitals, Research Centers, Defense Academies
- β Approval Workflows: Multi-level document and user approval system
- π§ Email Verification: Secure two-step registration with domain validation
- π Smart Notifications: Contextual alerts based on role and activity
- π± Mobile-First: Responsive design optimized for all devices
- π Real-time Notifications: Hierarchical notification routing system
- π Analytics Dashboard: System health, activity tracking, user insights
- π External Data Sync: Connect to ministry databases and APIs
- π¨ Theme Support: Light/dark mode with persistent user preferences
- π¬ Live Chat: Real-time AI assistant with conversation history
- Python 3.11+
- PostgreSQL 15+ with pgvector extension
- Node.js 18+
- Supabase account (or S3-compatible storage)
- Google API key (Gemini)
git clone <repository-url>
cd Beacon__V1# Create virtual environment
python -m venv venv
# Activate (Windows)
venv\Scripts\activate
# Activate (Linux/Mac)
source venv/bin/activate
# Install dependencies
pip install -r requirements.txtCreate .env file in root directory:
# Database
DATABASE_HOSTNAME=your-db-host
DATABASE_PORT=5432
DATABASE_NAME=postgres
DATABASE_USERNAME=your-username
DATABASE_PASSWORD=your-password
# Supabase Storage
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_KEY=your-supabase-key
SUPABASE_BUCKET_NAME=Docs
# AI Service
GOOGLE_API_KEY=your-google-api-key
# JWT Authentication
JWT_SECRET_KEY=your-secret-key
JWT_ALGORITHM=HS256
JWT_EXPIRATION_MINUTES=1440
# Email (Optional - for verification)
SMTP_HOST=smtp.gmail.com
SMTP_PORT=587
SMTP_USER=your-email@gmail.com
SMTP_PASSWORD=your-app-password
FROM_EMAIL=your-email@gmail.com
FROM_NAME=BEACON System
FRONTEND_URL=http://localhost:5173
# Redis (Optional - for caching)
REDIS_URL=redis://localhost:6379# Enable pgvector extension
python scripts/enable_pgvector.py
# Run migrations
alembic upgrade head
# Initialize developer account (optional)
python backend/init_developer.pyuvicorn backend.main:app --reload --host 127.0.0.1 --port 8000Backend will be available at: http://localhost:8000
cd frontend
# Install dependencies
npm install
# Create .env file
echo "VITE_API_BASE_URL=http://localhost:8000/api" > .env
# Start development server
npm run devFrontend will be available at: http://localhost:5173
For quick testing and demonstration purposes, a demo account is automatically created:
Demo Credentials:
Email: demo@beacon.system
Password: demo123
Role: Student
What you can test:
- β Login functionality
- β Document browsing and search
- β AI chat with document queries
- β Mobile responsiveness
- β Voice queries (if microphone available)
Create Demo Account Manually:
# Run the demo account script
python scripts/create_demo_account.py
# Or on Windows
scripts/create_demo_account.batNote: This is a demo account with limited permissions. It cannot upload documents or access admin features.
Gemini 2.5 Flash Integration:
- π§ Advanced Reasoning: Latest Gemini model for complex policy analysis
- π Multilingual Support: 100+ languages including Indian regional languages
- β‘ Fast Response: Optimized for real-time chat interactions
- π Context Awareness: Understands document relationships and policy implications
Google Cloud Vision API:
- πΈ OCR Processing: Extract text from images and scanned documents
- π Handwriting Recognition: Process handwritten notes and forms
- π PDF Text Extraction: Advanced text extraction from complex PDFs
- π Multi-language OCR: Support for Hindi, Tamil, Telugu, Bengali, and more
Google Speech-to-Text API:
- π€ Voice Queries: Ask questions in natural language via audio
- π£οΈ 98+ Languages: Support for major world languages
- π― High Accuracy: Advanced speech recognition with punctuation
- π± Real-time Processing: Live transcription for instant responses
Enhanced Document Management:
- π Document Families: Group related documents for better organization
- π Version Control: Track document updates and changes
- π·οΈ Smart Tagging: Auto-categorization based on content analysis
- π Batch Operations: Upload and process multiple documents simultaneously
Advanced AI Capabilities:
- π§© Policy Comparison: Side-by-side analysis of government policies
β οΈ Conflict Detection: Identify contradictions between documents- β Compliance Checking: Verify adherence to regulations and guidelines
- π Trend Analysis: Track policy changes over time
Smart Search & Retrieval:
- π Hybrid Search: Combines semantic and keyword search for better results
- π― Contextual Ranking: Results ranked by relevance and user role
- π Search Analytics: Track popular queries and document access patterns
- π Citation Tracking: Full source attribution for all AI responses
Mobile-First Design:
- π± Responsive UI: Optimized for mobile devices and tablets
- π Touch-Friendly: Intuitive gestures and mobile navigation
- π Offline Support: Basic functionality works without internet
- π² PWA Ready: Install as a mobile app
Real-time Collaboration:
- π¬ Live Chat: Real-time messaging with AI assistant
- π Smart Notifications: Contextual alerts based on user role and interests
- π₯ Team Workspaces: Collaborative document review and approval
- π Activity Feeds: Track team actions and document changes
Enterprise Security:
- π Zero-Trust Architecture: Verify every request and user
- π‘οΈ Data Encryption: End-to-end encryption for sensitive documents
- π Audit Trails: Complete logging of all user actions
- π Role-Based Access: Granular permissions based on organizational hierarchy
Scalability:
- βοΈ Auto-scaling: Handle varying loads automatically
- π Global CDN: Fast document access worldwide
- πΎ Unlimited Storage: Scale storage as needed
- β‘ Edge Computing: Reduced latency with global edge locations
Reliability:
- π 99.9% Uptime: Enterprise-grade availability
- π§ Auto-healing: Self-recovering infrastructure
- π Health Monitoring: Proactive issue detection
- π Data Backup: Automated backups and disaster recovery
Cost Optimization:
- π° Pay-per-use: Only pay for what you consume
- π Usage Analytics: Track and optimize costs
- π― Smart Quotas: Prevent unexpected charges
- π‘ Free Tier: Generous free usage limits
Backend:
- FastAPI (Python 3.11+)
- PostgreSQL with pgvector extension
- SQLAlchemy ORM with Alembic migrations
- JWT authentication with role-based access
- Redis caching for performance optimization
Frontend:
- React 18 with Vite build system
- TailwindCSS + shadcn/ui components
- Zustand state management
- React Router v6 with protected routes
- Axios for API calls with interceptors
Google Cloud AI Services:
- π§ Gemini 2.5 Flash: Advanced LLM for reasoning and analysis
- π€ Speech-to-Text API: Voice query processing (98+ languages)
- ποΈ Cloud Vision API: OCR and image text extraction
- π Translation API: Multi-language document support
- βοΈ Cloud Storage: Scalable document storage
AI/ML Stack:
- BGE-M3 embeddings (multilingual, 1024-dim)
- pgvector for similarity search
- Hybrid retrieval (semantic + keyword)
- Lazy embedding strategy for performance
- Citation tracking and source attribution
Infrastructure:
- Supabase (PostgreSQL + Storage)
- Vercel (Frontend hosting)
- Render (Backend hosting)
- Upstash Redis (Caching)
- UptimeRobot (Monitoring)
Upload β Process β Extract Metadata β Store
β
Query β Search Metadata β Rerank β Embed (if needed) β Search β Answer + Citations
Lazy Embedding Strategy:
- Documents uploaded instantly (no waiting for embedding)
- Embeddings generated on first query
- Subsequent queries use cached embeddings
- Multi-machine support via PostgreSQL storage
Developer (Super Admin)
β
Ministry Admin (MoE Officials)
β
University Admin (Institution Heads)
β
Document Officer (Upload/Manage Docs)
β
Student (Read-Only Access)
β
Public Viewer (Limited Access)
| Feature | Developer | Ministry Admin | University Admin | Document Officer | Student |
|---|---|---|---|---|---|
| View all documents | β | β (restricted) | β (institution) | β (institution) | β (public) |
| Upload documents | β | β (auto-approved) | β (needs approval) | β (needs approval) | β |
| Approve documents | β | β | β (institution) | β | β |
| Manage users | β | β (limited) | β (institution) | β | β |
| System health | β | β | β | β | β |
| Analytics | β | β | β (institution) | β | β |
POST /api/auth/register- User registrationPOST /api/auth/login- User loginPOST /api/auth/verify-email/{token}- Email verificationGET /api/auth/me- Get current user
POST /api/documents/upload- Upload documentGET /api/documents/list- List documents (role-filtered)GET /api/documents/{id}- Get document detailsGET /api/documents/{id}/download- Download documentDELETE /api/documents/{id}- Delete document
GET /api/approvals/pending- Get pending documentsPOST /api/approvals/{id}/approve- Approve documentPOST /api/approvals/{id}/reject- Reject document
POST /api/chat/query- Ask AI questionPOST /api/voice/query- Voice query (audio upload)GET /api/chat/sessions- Get chat history
GET /api/institutions/list- List institutionsPOST /api/institutions/create- Create institutionDELETE /api/institutions/{id}- Delete institution
GET /api/notifications/list- List notificationsGET /api/notifications/unread-count- Unread countPOST /api/notifications/{id}/mark-read- Mark as read
GET /api/analytics/stats- System statisticsGET /api/analytics/activity- Activity feedGET /api/audit/logs- Audit logs
Full API Documentation: http://localhost:8000/docs
# Run all tests
python tests/run_all_tests.py
# Individual tests
python tests/test_embeddings.py
python tests/test_voice_query.py
python tests/test_multilingual_embeddings.py
python tests/test_compliance_api.py
python tests/test_conflict_detection_api.py| Operation | Time | Notes |
|---|---|---|
| Document Upload | 3-7s | Instant response |
| Query (embedded) | 4-7s | Fast |
| Query (first time) | 12-19s | Includes embedding |
| Voice transcription | 5-10s | 1 min audio |
| User Login | <1s | JWT generation |
- β JWT-based authentication
- β Email verification required
- β Role-based access control (RBAC)
- β Document-level permissions
- β Audit logging for all actions
- β SQL injection prevention (SQLAlchemy ORM)
- β XSS protection (React escaping)
- β Soft deletes (preserve audit trail)
Beacon__V1/
βββ Agent/ # AI/ML Components
β βββ embeddings/ # BGE-M3 embeddings
β βββ voice/ # Whisper transcription
β βββ rag_agent/ # ReAct agent
β βββ retrieval/ # Hybrid search
β βββ lazy_rag/ # On-demand embedding
β βββ vector_store/ # pgvector integration
β βββ tools/ # Search tools
β
βββ backend/ # FastAPI Backend
β βββ routers/ # API endpoints
β βββ utils/ # Helper functions
β βββ database.py # SQLAlchemy models
β βββ main.py # FastAPI app
β
βββ frontend/ # React Frontend
β βββ src/
β β βββ components/ # Reusable components
β β βββ pages/ # Route pages
β β βββ services/ # API calls
β β βββ stores/ # Zustand stores
β βββ package.json
β
βββ alembic/ # Database migrations
βββ scripts/ # Utility scripts
βββ tests/ # Test suite
βββ .env # Environment variables
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ PROJECT_DESCRIPTION.md # Detailed documentation
# Check PostgreSQL is running
psql -h HOST -U USER -d DATABASE
# Verify .env file has correct credentials
# Test connection: python test_redis_connection.py# Install PyTorch with CUDA support
pip install torch --index-url https://download.pytorch.org/whl/cu118# Install FFmpeg
# Windows: Download from https://ffmpeg.org/download.html
# Linux: sudo apt install ffmpeg
# Mac: brew install ffmpeg# For Gmail:
# 1. Enable 2-Factor Authentication
# 2. Generate App Password: https://myaccount.google.com/apppasswords
# 3. Use App Password as SMTP_PASSWORD in .env- β Migrated from FAISS to pgvector for multi-machine support
- β Implemented lazy RAG for instant document uploads
- β Added email verification system
- β Enhanced notification system with hierarchical routing
- β Improved analytics dashboard with system health monitoring
- β Optimized performance with Redis caching
- β Added voice query support (98+ languages)
- β Implemented document approval workflows
- β Enhanced role-based access control
- Documentation: See phase documentation files for detailed guides
- API Docs: http://localhost:8000/docs
- Logs:
Agent/agent_logs/ - Tests:
python tests/run_all_tests.py
β
Multi-format document processing
β
Multilingual embeddings (100+ languages)
β
Voice query system (98+ languages)
β
Lazy RAG (instant uploads)
β
Hybrid retrieval (semantic + keyword)
β
External data ingestion
β
Citation tracking
β
Production-ready
Built with β€οΈ for Government Policy Intelligence
Version: 2.0.0 | Status: β Production Ready | Last Updated: December 5, 2025