Skip to content

pratikbh0sa1e/Beacon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

63 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎯 BEACON - Government Policy Intelligence Platform

AI-powered platform for Ministry of Education (MoE) and Higher-Education institutions to retrieve, understand, compare, explain, and audit government policies.

Status Version Python React


πŸ“š Documentation Structure

This project uses a phase-based documentation system for better organization:

Core Documentation

  • README.md (this file) - Quick start and overview
  • PROJECT_DESCRIPTION.md - Comprehensive technical documentation

Phase Documentation

  1. PHASE_1_SETUP_AND_AUTHENTICATION.md (7 documents)

    • Email verification system
    • Two-step registration
    • University email domain validation
    • Authentication setup guides
  2. PHASE_2_DOCUMENT_MANAGEMENT.md (15 documents)

    • Document approval workflows
    • Draft and review processes
    • Access control and security
    • Status visibility and badges
    • Search and sorting features
  3. PHASE_3_INSTITUTION_AND_ROLE_MANAGEMENT.md (22 documents)

    • Institution hierarchy management
    • Ministry and university relationships
    • Role-based permissions
    • Institution deletion workflows
    • User management strategies
  4. PHASE_4_ADVANCED_FEATURES_AND_OPTIMIZATIONS.md (61 documents)

    • Chat system and voice queries
    • Notification system
    • RAG and vector store optimizations
    • Performance improvements (Redis, caching, indexing)
    • External data sources
    • Analytics and insights
    • UI/UX fixes and enhancements
    • Security audits and fixes

✨ Key Features

πŸ€– AI-Powered Intelligence (Google Services)

  • 🧠 Gemini 2.5 Flash: Latest Google AI for advanced reasoning and policy analysis
  • 🎀 Voice Queries: Google Speech-to-Text API supporting 98+ languages
  • πŸ‘οΈ Smart OCR: Google Cloud Vision API for text extraction from images and PDFs
  • 🌍 Multilingual: 100+ languages including Hindi, Tamil, Telugu, Bengali
  • πŸ“Š Policy Analysis: Compare documents, detect conflicts, check compliance
  • πŸ” Contextual Search: Understand intent and provide relevant answers

πŸ“„ Document Management

  • πŸ“ Multi-format Support: PDF, DOCX, PPTX, Images (with Google OCR)
  • πŸ” Hybrid Search: Semantic + keyword search with intelligent ranking
  • ⚑ Lazy RAG: Instant uploads, on-demand embedding for faster processing
  • πŸ“š Citation Tracking: All AI answers include source documents with page numbers
  • πŸ” Role-Based Access: Hierarchical document visibility and permissions
  • πŸ“‹ Document Families: Group related documents for better organization

πŸ‘₯ User & Institution Management

  • πŸ›οΈ Role Hierarchy: Developer β†’ Ministry Admin β†’ University Admin β†’ Document Officer β†’ Student
  • 🏒 Institution Types: Universities, Hospitals, Research Centers, Defense Academies
  • βœ… Approval Workflows: Multi-level document and user approval system
  • πŸ“§ Email Verification: Secure two-step registration with domain validation
  • πŸ”” Smart Notifications: Contextual alerts based on role and activity

πŸš€ Advanced Features

  • πŸ“± Mobile-First: Responsive design optimized for all devices
  • πŸ”” Real-time Notifications: Hierarchical notification routing system
  • πŸ“ˆ Analytics Dashboard: System health, activity tracking, user insights
  • πŸ”— External Data Sync: Connect to ministry databases and APIs
  • 🎨 Theme Support: Light/dark mode with persistent user preferences
  • πŸ’¬ Live Chat: Real-time AI assistant with conversation history

πŸš€ Quick Start

Prerequisites

  • Python 3.11+
  • PostgreSQL 15+ with pgvector extension
  • Node.js 18+
  • Supabase account (or S3-compatible storage)
  • Google API key (Gemini)

1. Clone Repository

git clone <repository-url>
cd Beacon__V1

2. Backend Setup

# Create virtual environment
python -m venv venv

# Activate (Windows)
venv\Scripts\activate

# Activate (Linux/Mac)
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

3. Configure Environment

Create .env file in root directory:

# Database
DATABASE_HOSTNAME=your-db-host
DATABASE_PORT=5432
DATABASE_NAME=postgres
DATABASE_USERNAME=your-username
DATABASE_PASSWORD=your-password

# Supabase Storage
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_KEY=your-supabase-key
SUPABASE_BUCKET_NAME=Docs

# AI Service
GOOGLE_API_KEY=your-google-api-key

# JWT Authentication
JWT_SECRET_KEY=your-secret-key
JWT_ALGORITHM=HS256
JWT_EXPIRATION_MINUTES=1440

# Email (Optional - for verification)
SMTP_HOST=smtp.gmail.com
SMTP_PORT=587
SMTP_USER=your-email@gmail.com
SMTP_PASSWORD=your-app-password
FROM_EMAIL=your-email@gmail.com
FROM_NAME=BEACON System
FRONTEND_URL=http://localhost:5173

# Redis (Optional - for caching)
REDIS_URL=redis://localhost:6379

4. Database Setup

# Enable pgvector extension
python scripts/enable_pgvector.py

# Run migrations
alembic upgrade head

# Initialize developer account (optional)
python backend/init_developer.py

5. Start Backend

uvicorn backend.main:app --reload --host 127.0.0.1 --port 8000

Backend will be available at: http://localhost:8000

6. Frontend Setup

cd frontend

# Install dependencies
npm install

# Create .env file
echo "VITE_API_BASE_URL=http://localhost:8000/api" > .env

# Start development server
npm run dev

Frontend will be available at: http://localhost:5173


🎯 Demo Account

For quick testing and demonstration purposes, a demo account is automatically created:

Demo Credentials:

Email: demo@beacon.system
Password: demo123
Role: Student

What you can test:

  • βœ… Login functionality
  • βœ… Document browsing and search
  • βœ… AI chat with document queries
  • βœ… Mobile responsiveness
  • βœ… Voice queries (if microphone available)

Create Demo Account Manually:

# Run the demo account script
python scripts/create_demo_account.py

# Or on Windows
scripts/create_demo_account.bat

Note: This is a demo account with limited permissions. It cannot upload documents or access admin features.


πŸš€ New Features & Google Services Integration

πŸ€– Google AI Services

Gemini 2.5 Flash Integration:

  • 🧠 Advanced Reasoning: Latest Gemini model for complex policy analysis
  • 🌍 Multilingual Support: 100+ languages including Indian regional languages
  • ⚑ Fast Response: Optimized for real-time chat interactions
  • πŸ“Š Context Awareness: Understands document relationships and policy implications

Google Cloud Vision API:

  • πŸ“Έ OCR Processing: Extract text from images and scanned documents
  • πŸ” Handwriting Recognition: Process handwritten notes and forms
  • πŸ“„ PDF Text Extraction: Advanced text extraction from complex PDFs
  • 🌐 Multi-language OCR: Support for Hindi, Tamil, Telugu, Bengali, and more

Google Speech-to-Text API:

  • 🎀 Voice Queries: Ask questions in natural language via audio
  • πŸ—£οΈ 98+ Languages: Support for major world languages
  • 🎯 High Accuracy: Advanced speech recognition with punctuation
  • πŸ“± Real-time Processing: Live transcription for instant responses

πŸ†• Latest Features (v2.0.0)

Enhanced Document Management:

  • πŸ“š Document Families: Group related documents for better organization
  • πŸ”„ Version Control: Track document updates and changes
  • 🏷️ Smart Tagging: Auto-categorization based on content analysis
  • πŸ“‹ Batch Operations: Upload and process multiple documents simultaneously

Advanced AI Capabilities:

  • 🧩 Policy Comparison: Side-by-side analysis of government policies
  • ⚠️ Conflict Detection: Identify contradictions between documents
  • βœ… Compliance Checking: Verify adherence to regulations and guidelines
  • πŸ“ˆ Trend Analysis: Track policy changes over time

Smart Search & Retrieval:

  • πŸ” Hybrid Search: Combines semantic and keyword search for better results
  • 🎯 Contextual Ranking: Results ranked by relevance and user role
  • πŸ“Š Search Analytics: Track popular queries and document access patterns
  • πŸ”— Citation Tracking: Full source attribution for all AI responses

Mobile-First Design:

  • πŸ“± Responsive UI: Optimized for mobile devices and tablets
  • πŸ‘† Touch-Friendly: Intuitive gestures and mobile navigation
  • πŸ”„ Offline Support: Basic functionality works without internet
  • πŸ“² PWA Ready: Install as a mobile app

Real-time Collaboration:

  • πŸ’¬ Live Chat: Real-time messaging with AI assistant
  • πŸ”” Smart Notifications: Contextual alerts based on user role and interests
  • πŸ‘₯ Team Workspaces: Collaborative document review and approval
  • πŸ“Š Activity Feeds: Track team actions and document changes

Enterprise Security:

  • πŸ” Zero-Trust Architecture: Verify every request and user
  • πŸ›‘οΈ Data Encryption: End-to-end encryption for sensitive documents
  • πŸ“‹ Audit Trails: Complete logging of all user actions
  • πŸ”’ Role-Based Access: Granular permissions based on organizational hierarchy

🌟 Google Cloud Integration Benefits

Scalability:

  • ☁️ Auto-scaling: Handle varying loads automatically
  • 🌍 Global CDN: Fast document access worldwide
  • πŸ’Ύ Unlimited Storage: Scale storage as needed
  • ⚑ Edge Computing: Reduced latency with global edge locations

Reliability:

  • πŸ”„ 99.9% Uptime: Enterprise-grade availability
  • πŸ”§ Auto-healing: Self-recovering infrastructure
  • πŸ“Š Health Monitoring: Proactive issue detection
  • πŸ”’ Data Backup: Automated backups and disaster recovery

Cost Optimization:

  • πŸ’° Pay-per-use: Only pay for what you consume
  • πŸ“Š Usage Analytics: Track and optimize costs
  • 🎯 Smart Quotas: Prevent unexpected charges
  • πŸ’‘ Free Tier: Generous free usage limits

πŸ—οΈ System Architecture

Technology Stack

Backend:

  • FastAPI (Python 3.11+)
  • PostgreSQL with pgvector extension
  • SQLAlchemy ORM with Alembic migrations
  • JWT authentication with role-based access
  • Redis caching for performance optimization

Frontend:

  • React 18 with Vite build system
  • TailwindCSS + shadcn/ui components
  • Zustand state management
  • React Router v6 with protected routes
  • Axios for API calls with interceptors

Google Cloud AI Services:

  • 🧠 Gemini 2.5 Flash: Advanced LLM for reasoning and analysis
  • 🎀 Speech-to-Text API: Voice query processing (98+ languages)
  • πŸ‘οΈ Cloud Vision API: OCR and image text extraction
  • 🌍 Translation API: Multi-language document support
  • ☁️ Cloud Storage: Scalable document storage

AI/ML Stack:

  • BGE-M3 embeddings (multilingual, 1024-dim)
  • pgvector for similarity search
  • Hybrid retrieval (semantic + keyword)
  • Lazy embedding strategy for performance
  • Citation tracking and source attribution

Infrastructure:

  • Supabase (PostgreSQL + Storage)
  • Vercel (Frontend hosting)
  • Render (Backend hosting)
  • Upstash Redis (Caching)
  • UptimeRobot (Monitoring)

RAG Architecture

Upload β†’ Process β†’ Extract Metadata β†’ Store
                                        ↓
Query β†’ Search Metadata β†’ Rerank β†’ Embed (if needed) β†’ Search β†’ Answer + Citations

Lazy Embedding Strategy:

  • Documents uploaded instantly (no waiting for embedding)
  • Embeddings generated on first query
  • Subsequent queries use cached embeddings
  • Multi-machine support via PostgreSQL storage

πŸ‘₯ User Roles & Hierarchy

Developer (Super Admin)
    ↓
Ministry Admin (MoE Officials)
    ↓
University Admin (Institution Heads)
    ↓
Document Officer (Upload/Manage Docs)
    ↓
Student (Read-Only Access)
    ↓
Public Viewer (Limited Access)

Role Permissions

Feature Developer Ministry Admin University Admin Document Officer Student
View all documents βœ… βœ… (restricted) βœ… (institution) βœ… (institution) βœ… (public)
Upload documents βœ… βœ… (auto-approved) βœ… (needs approval) βœ… (needs approval) ❌
Approve documents βœ… βœ… βœ… (institution) ❌ ❌
Manage users βœ… βœ… (limited) βœ… (institution) ❌ ❌
System health βœ… ❌ ❌ ❌ ❌
Analytics βœ… βœ… βœ… (institution) ❌ ❌

πŸ“‘ API Endpoints

Authentication

  • POST /api/auth/register - User registration
  • POST /api/auth/login - User login
  • POST /api/auth/verify-email/{token} - Email verification
  • GET /api/auth/me - Get current user

Documents

  • POST /api/documents/upload - Upload document
  • GET /api/documents/list - List documents (role-filtered)
  • GET /api/documents/{id} - Get document details
  • GET /api/documents/{id}/download - Download document
  • DELETE /api/documents/{id} - Delete document

Approvals

  • GET /api/approvals/pending - Get pending documents
  • POST /api/approvals/{id}/approve - Approve document
  • POST /api/approvals/{id}/reject - Reject document

Chat & AI

  • POST /api/chat/query - Ask AI question
  • POST /api/voice/query - Voice query (audio upload)
  • GET /api/chat/sessions - Get chat history

Institutions

  • GET /api/institutions/list - List institutions
  • POST /api/institutions/create - Create institution
  • DELETE /api/institutions/{id} - Delete institution

Notifications

  • GET /api/notifications/list - List notifications
  • GET /api/notifications/unread-count - Unread count
  • POST /api/notifications/{id}/mark-read - Mark as read

Analytics

  • GET /api/analytics/stats - System statistics
  • GET /api/analytics/activity - Activity feed
  • GET /api/audit/logs - Audit logs

Full API Documentation: http://localhost:8000/docs


πŸ§ͺ Testing

# Run all tests
python tests/run_all_tests.py

# Individual tests
python tests/test_embeddings.py
python tests/test_voice_query.py
python tests/test_multilingual_embeddings.py
python tests/test_compliance_api.py
python tests/test_conflict_detection_api.py

πŸ“Š Performance Metrics

Operation Time Notes
Document Upload 3-7s Instant response
Query (embedded) 4-7s Fast
Query (first time) 12-19s Includes embedding
Voice transcription 5-10s 1 min audio
User Login <1s JWT generation

πŸ” Security Features

  • βœ… JWT-based authentication
  • βœ… Email verification required
  • βœ… Role-based access control (RBAC)
  • βœ… Document-level permissions
  • βœ… Audit logging for all actions
  • βœ… SQL injection prevention (SQLAlchemy ORM)
  • βœ… XSS protection (React escaping)
  • βœ… Soft deletes (preserve audit trail)

πŸ“ Project Structure

Beacon__V1/
β”œβ”€β”€ Agent/                      # AI/ML Components
β”‚   β”œβ”€β”€ embeddings/            # BGE-M3 embeddings
β”‚   β”œβ”€β”€ voice/                 # Whisper transcription
β”‚   β”œβ”€β”€ rag_agent/             # ReAct agent
β”‚   β”œβ”€β”€ retrieval/             # Hybrid search
β”‚   β”œβ”€β”€ lazy_rag/              # On-demand embedding
β”‚   β”œβ”€β”€ vector_store/          # pgvector integration
β”‚   └── tools/                 # Search tools
β”‚
β”œβ”€β”€ backend/                    # FastAPI Backend
β”‚   β”œβ”€β”€ routers/               # API endpoints
β”‚   β”œβ”€β”€ utils/                 # Helper functions
β”‚   β”œβ”€β”€ database.py            # SQLAlchemy models
β”‚   └── main.py                # FastAPI app
β”‚
β”œβ”€β”€ frontend/                   # React Frontend
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ components/        # Reusable components
β”‚   β”‚   β”œβ”€β”€ pages/             # Route pages
β”‚   β”‚   β”œβ”€β”€ services/          # API calls
β”‚   β”‚   └── stores/            # Zustand stores
β”‚   └── package.json
β”‚
β”œβ”€β”€ alembic/                    # Database migrations
β”œβ”€β”€ scripts/                    # Utility scripts
β”œβ”€β”€ tests/                      # Test suite
β”œβ”€β”€ .env                        # Environment variables
β”œβ”€β”€ requirements.txt            # Python dependencies
β”œβ”€β”€ README.md                   # This file
└── PROJECT_DESCRIPTION.md      # Detailed documentation

πŸ› Troubleshooting

Database Connection Issues

# Check PostgreSQL is running
psql -h HOST -U USER -d DATABASE

# Verify .env file has correct credentials
# Test connection: python test_redis_connection.py

GPU Not Detected

# Install PyTorch with CUDA support
pip install torch --index-url https://download.pytorch.org/whl/cu118

Voice Not Working

# Install FFmpeg
# Windows: Download from https://ffmpeg.org/download.html
# Linux: sudo apt install ffmpeg
# Mac: brew install ffmpeg

Email Verification Not Sending

# For Gmail:
# 1. Enable 2-Factor Authentication
# 2. Generate App Password: https://myaccount.google.com/apppasswords
# 3. Use App Password as SMTP_PASSWORD in .env

πŸ”„ Recent Updates

Version 2.0.0 (December 2025)

  • βœ… Migrated from FAISS to pgvector for multi-machine support
  • βœ… Implemented lazy RAG for instant document uploads
  • βœ… Added email verification system
  • βœ… Enhanced notification system with hierarchical routing
  • βœ… Improved analytics dashboard with system health monitoring
  • βœ… Optimized performance with Redis caching
  • βœ… Added voice query support (98+ languages)
  • βœ… Implemented document approval workflows
  • βœ… Enhanced role-based access control

πŸ“ž Support

  • Documentation: See phase documentation files for detailed guides
  • API Docs: http://localhost:8000/docs
  • Logs: Agent/agent_logs/
  • Tests: python tests/run_all_tests.py

🎯 Key Achievements

βœ… Multi-format document processing
βœ… Multilingual embeddings (100+ languages)
βœ… Voice query system (98+ languages)
βœ… Lazy RAG (instant uploads)
βœ… Hybrid retrieval (semantic + keyword)
βœ… External data ingestion
βœ… Citation tracking
βœ… Production-ready


Built with ❀️ for Government Policy Intelligence

Version: 2.0.0 | Status: βœ… Production Ready | Last Updated: December 5, 2025

About

A smart platform that helps users understand government policies, schemes, and services through simple analysis and guided support, making public information easy, clear, and accessible for everyone.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors