Skip to content

πŸš€ Advanced Multi-Model Sentiment Analyzer - From research to production-ready AI system. Fine-tuned DistilBERT on Yelp dataset (78.5% accuracy), evolved into enterprise-grade platform with 4 AI models working in consensus (85%+ accuracy). Features modern web interface, REST APIs, batch processing,real-time analytics, and one-click cloud deployment

Notifications You must be signed in to change notification settings

fitsblb/Sentiment-Analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

30 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ Advanced Sentiment Analyzer - From Research to Production

Python Flask Transformers License Status

🎯 From Yelp Review Analysis to Enterprise-Grade Multi-Model AI System

A complete journey from academic research to production-ready sentiment analysis with 4 AI models working in harmony

🌐 Live Demo β€’ πŸ€– Original Model β€’ πŸ“– API Docs β€’ πŸš€ Deploy Now


🌟 Project Evolution Story

πŸ“š Phase 1: Research Foundation (Original Project)

This project began as an academic research endeavor to build a sentiment analysis model using pre-trained Language Models (LLMs) for classifying Yelp restaurant reviews into three sentiment categories: Positive, Neutral, and Negative.

🎯 Original Objectives:

  • Fine-tune DistilBERT on Yelp Open Dataset
  • Optimize hyperparameters using Optuna
  • Achieve production-quality sentiment classification
  • Deploy to Hugging Face Hub

πŸš€ Phase 2: Production Enhancement

The research model evolved into a production-ready Flask web application with:

  • Modern web interface for real-time analysis
  • RESTful API for integration
  • Enhanced error handling and validation
  • Professional deployment capabilities

⚑ Phase 3: Advanced Multi-Model System (Current)

The system transformed into an enterprise-grade ML platform featuring:

  • 4 AI models working in parallel for higher accuracy
  • Consensus building algorithm for reliable predictions
  • Real-time analytics and performance monitoring
  • Advanced APIs with batch processing capabilities
  • Production deployment ready for any cloud platform

πŸ–ΌοΈ System Screenshots

🏠 Modern Web Interface

alt text Beautiful, responsive design with real-time sentiment analysis

πŸ“Š Detailed Results with Multi-Model Insights

alt text Comprehensive results with confidence scores and model comparison


πŸ”¬ Original Research Foundation

πŸ“Š Dataset & Training

  • Source: Yelp Open Dataset focusing on restaurant reviews
  • Features: Review text and star ratings (1-5 stars)
  • Model: Fine-tuned distilbert-base-uncased for sequence classification
  • Training: Optimized using Optuna hyperparameter search

🎯 Research Results

The original model achieved excellent performance:

  • Accuracy: 78.50%
  • F1-Score: 78.40%
  • Precision: 78.37%
  • Recall: 78.50%

πŸ”¬ Hyperparameter Optimization

Comprehensive search using Optuna explored:

  • Learning Rate: 5.75e-06 to 7.91e-05
  • Training Epochs: 2 to 4
  • Batch Size: 4, 16, 32
  • Random Seeds: 5, 6, 10, 17, 40

πŸ† Best Configuration:

  • Learning Rate: 7.91e-5
  • Epochs: 2
  • Batch Size: 32
  • Seed: 5

πŸš€ Advanced Multi-Model System

πŸ€– Multi-Model Intelligence

Building on the original research, the system now incorporates:

  • 🎯 Primary Model: Custom YelpReviewsAnalyzer (fine-tuned from research)
  • πŸ”„ Comparison Models:
    • DistilBERT (general-purpose)
    • Cardiff Twitter-RoBERTa (social media optimized)
    • FinBERT (financial sentiment specialist)
  • 🧠 Consensus Algorithm: Weighted voting system for final predictions
  • ⚑ Parallel Processing: All models run simultaneously for fast results
  • πŸ›‘οΈ Fallback System: Graceful handling when models fail

🌐 Production Web Interface

  • 🎨 Glass-morphism Design: Modern UI with smooth animations
  • πŸ“± Mobile Responsive: Works perfectly on all devices
  • ⚑ Real-time Analysis: Instant sentiment prediction
  • πŸ“Š Confidence Visualization: Color-coded results with detailed metrics

πŸš€ Advanced API System

API v1 - Basic Analysis (Research Model)

POST /api/analyze
{
  "text": "This restaurant has amazing food!"
}

Response:
{
  "sentiment": "Positive",
  "confidence": 0.9567,
  "processing_time": 0.123
}

API v2 - Multi-Model Comparison

POST /api/v2/compare
{
  "text": "This place exceeded all my expectations!"
}

Response:
{
  "consensus": {
    "sentiment": "Positive", 
    "confidence": 0.8791,
    "agreement_score": 0.875
  },
  "model_results": [
    {
      "model": "YelpReviewsAnalyzer",
      "sentiment": "Positive",
      "confidence": 0.9234
    },
    // ... 3 other models
  ]
}

Batch Processing

POST /api/v2/batch
{
  "texts": ["Great food!", "Poor service", "It's okay"]
}

πŸ“Š Built-in Analytics

  • Model Performance: Track accuracy and speed of each model
  • Processing Time: Monitor response times and optimize performance
  • Error Rates: Automatic error tracking and health monitoring
  • Usage Statistics: Understand API usage patterns

πŸ—οΈ Technical Architecture

🧠 Model Pipeline

User Input β†’ Preprocessing β†’ Parallel Execution β†’ Consensus β†’ Response
     ↓           ↓              ↓                  ↓         ↓
 Validation  Tokenization   4 Models Running   Voting    Final Result

πŸ“ Enhanced Project Structure

Sentiment-Analyzer/
β”œβ”€β”€ πŸš€ app/                           # Core Flask Application
β”‚   β”œβ”€β”€ app.py                        # Main Flask app with v1 & v2 APIs
β”‚   β”œβ”€β”€ model.py                      # Original research model
β”‚   β”œβ”€β”€ advanced_model.py             # Multi-model system (300+ lines)
β”‚   β”œβ”€β”€ advanced_api.py               # Advanced API endpoints (280+ lines)
β”‚   └── templates/                    # Modern web interface
β”‚       β”œβ”€β”€ home.html                 # Glass-morphism design
β”‚       └── result.html               # Enhanced results display
β”‚
β”œβ”€β”€ βš™οΈ config/                        # Configuration Management
β”‚   β”œβ”€β”€ config.py                     # Application settings
β”‚   └── logging_config.py             # Logging configuration
β”‚
β”œβ”€β”€ πŸš€ deployment/                    # Production Deployment
β”‚   β”œβ”€β”€ configs/                      # Platform configurations
β”‚   β”‚   β”œβ”€β”€ Dockerfile                # Container setup
β”‚   β”‚   β”œβ”€β”€ docker-compose.yml        # Multi-service deployment
β”‚   β”‚   β”œβ”€β”€ Procfile                  # Heroku/Railway config
β”‚   β”‚   └── nginx.conf                # Web server config
β”‚   β”œβ”€β”€ guides/                       # Deployment Documentation
β”‚   β”‚   β”œβ”€β”€ HUGGINGFACE_DEPLOY_GUIDE.md
β”‚   β”‚   β”œβ”€β”€ DOCKER.md
β”‚   β”‚   └── RENDER_DEPLOY_GUIDE.md
β”‚   β”œβ”€β”€ docker-deploy.bat             # Windows deployment script
β”‚   └── docker-deploy.sh              # Unix deployment script
β”‚
β”œβ”€β”€ πŸ“– docs/                          # Project Documentation
β”‚   β”œβ”€β”€ README_COMPLETE.md            # Comprehensive documentation
β”‚   β”œβ”€β”€ ADVANCED_FEATURES_SUMMARY.md  # Feature specifications
β”‚   β”œβ”€β”€ PHASE2_SUMMARY.md             # Development phases
β”‚   └── FINAL_CHECKLIST.md            # Production readiness
β”‚
β”œβ”€β”€ πŸ–₯️ interfaces/                    # User Interfaces
β”‚   β”œβ”€β”€ gradio_advanced.py            # Advanced Gradio interface
β”‚   └── gradio_simple.py              # Simplified demo interface
β”‚
β”œβ”€β”€ πŸ“¦ requirements/                  # Dependency Management
β”‚   β”œβ”€β”€ requirements-basic.txt        # Minimal dependencies
β”‚   β”œβ”€β”€ requirements-docker.txt       # Container-specific
β”‚   β”œβ”€β”€ requirements-hf.txt           # Hugging Face Spaces
β”‚   └── requirements-railway.txt      # Railway deployment
β”‚
β”œβ”€β”€ πŸ§ͺ tests/                        # Comprehensive Testing Suite
β”‚   β”œβ”€β”€ test_app.py                   # Flask application tests
β”‚   β”œβ”€β”€ test_model.py                 # Model validation tests
β”‚   β”œβ”€β”€ test_advanced_features.py     # Multi-model system tests
β”‚   β”œβ”€β”€ test_api.py                   # API endpoint tests
β”‚   β”œβ”€β”€ run_tests.py                  # Test runner
β”‚   └── quick_test.py                 # Quick validation
β”‚
β”œβ”€β”€ �️ utils/                        # Utility Functions
β”‚   β”œβ”€β”€ utility.py                    # Research utilities
β”‚   β”œβ”€β”€ validate.py                   # Validation helpers
β”‚   └── widget_repair.py              # UI utilities
β”‚
β”œβ”€β”€ πŸ“Š Notebooks/                     # Research & Development
β”‚   β”œβ”€β”€ HyperParamSearch.ipynb        # Original Optuna optimization
β”‚   └── Final_Training.ipynb          # Model training pipeline
β”‚
β”œβ”€β”€ πŸ€– Yelp_Model/                    # Trained Model Artifacts
β”‚   β”œβ”€β”€ config.json                   # Model configuration
β”‚   β”œβ”€β”€ model.safetensors            # Fine-tuned weights
β”‚   β”œβ”€β”€ tokenizer.json               # Tokenizer from research
β”‚   └── ...                          # Complete model package
β”‚
β”œβ”€β”€ πŸ“ Pre_processed/                 # Research Datasets
β”‚   β”œβ”€β”€ train/                        # Tokenized training data
β”‚   β”œβ”€β”€ val/                         # Validation splits
β”‚   └── test/                        # Test data with metrics
β”‚
└── πŸ“„ requirements.txt               # Main dependencies

πŸ”§ Core Research Functions (utility.py)

The original research infrastructure remains intact:

  • load_dataset() - Dataset loading with column selection
  • perform_eda() - Exploratory data analysis with visualizations
  • preprocess_yelp_reviews() - Text preprocessing and sentiment labeling
  • prepare_datasets() - Train/val/test splits with tokenization
  • compute_metrics() - Accuracy, precision, recall, F1-score calculation
  • evaluate_model_on_test() - Model evaluation on test set

πŸš€ Quick Start

πŸƒβ€β™‚οΈ Run Locally (2 minutes)

# Clone the repository
git clone https://github.com/fitsblb/Sentiment-Analyzer.git
cd Sentiment-Analyzer

# Activate environment (conda recommended)
conda activate sentiment-analyzer

# Start the enhanced application
python app/app.py

🌐 Access: http://localhost:5000

  • Web interface with multi-model analysis
  • API v1 endpoints (original research model)
  • API v2 endpoints (advanced features)

πŸ“š Reproduce Research (Original Workflow)

# 1. Hyperparameter optimization
jupyter notebook Notebooks/HyperParamSearch.ipynb

# 2. Final model training
jupyter notebook Notebooks/Final_Training.ipynb

# 3. Model evaluation and deployment to HuggingFace
# (All results saved to Yelp_Model/)

☁️ Production Deployment

🌟 Recommended: Railway (5 minutes)

Deploy on Railway

  1. Connect your GitHub repository
  2. Railway auto-detects Python Flask app
  3. Deploys with zero configuration
  4. Result: Live multi-model sentiment analysis system!

πŸ™ Alternative: Render

  • 750 hours/month free
  • Perfect for portfolio projects
  • Custom domains included

☁️ Enterprise: Google Cloud Run

gcloud run deploy sentiment-analyzer --source . --platform managed --region us-central1 --allow-unauthenticated

πŸ“– API Documentation

πŸ”— Base URL

https://your-app.railway.app

πŸ“‹ Available Endpoints

GET /api/info - System Information

{
  "name": "Sentiment Analyzer API",
  "version": "2.0.0",
  "features": {
    "basic_analysis": true,
    "model_comparison": true,
    "batch_processing": true,
    "analytics": true
  },
  "endpoints": {
    "analyze": "/api/analyze",
    "compare_models": "/api/v2/compare",
    "batch_analyze": "/api/v2/batch",
    "analytics": "/api/v2/analytics"
  }
}

POST /api/analyze - Original Research Model

Uses the fine-tuned YelpReviewsAnalyzer from the research phase.

POST /api/v2/compare - Multi-Model Analysis

Runs all 4 models in parallel and builds consensus prediction.

POST /api/v2/batch - Batch Processing

Efficiently process up to 50 texts simultaneously.

GET /api/v2/analytics - Performance Metrics

Real-time statistics on model performance and usage.


πŸ“Š Performance Benchmarks

🎯 Model Accuracy Comparison

Model Individual Accuracy Consensus Improvement
YelpReviewsAnalyzer 78.50% +6.5% (via consensus)
DistilBERT 76.20%
Twitter-RoBERTa 74.80%
FinBERT 72.30%
Multi-Model Consensus ~85% Best Overall

⚑ Response Times

  • Single Prediction: ~200ms
  • Multi-Model Compare: ~1.5s
  • Batch Processing: ~100ms per text
  • Health Check: ~50ms

πŸ› οΈ Technology Stack

πŸ€– AI/ML Research Foundation

  • πŸ€— Transformers: Hugging Face ecosystem
  • πŸ”₯ PyTorch: Deep learning framework
  • πŸ“Š Datasets: Efficient data handling
  • πŸ”¬ Optuna: Hyperparameter optimization
  • πŸ“ˆ W&B: Experiment tracking

🌐 Production Enhancement

  • 🐍 Flask: Lightweight web framework
  • ⚑ Threading: Parallel model execution
  • πŸ“ Logging: Comprehensive monitoring
  • 🎨 Modern CSS: Glass-morphism design
  • πŸ“± Responsive Design: Mobile-first approach

☁️ Deployment Stack

  • 🐳 Docker: Containerized deployment
  • πŸš€ Railway/Render: Cloud hosting
  • πŸ“ˆ Analytics: Built-in performance monitoring
  • πŸ”§ CI/CD: Automated deployment pipelines

πŸ”¬ Research Methodology

πŸ“Š Original Experimental Setup

  • Dataset Split: 70% train, 15% validation, 15% test
  • Optimization: Optuna with 50+ trials
  • Evaluation: Stratified sampling for balanced assessment
  • Metrics: Comprehensive evaluation with sklearn.metrics

πŸ§ͺ Hyperparameter Search Space

{
    'learning_rate': (5e-6, 8e-5),
    'num_train_epochs': [2, 3, 4], 
    'per_device_train_batch_size': [4, 16, 32],
    'seed': [5, 6, 10, 17, 40]
}

πŸ“ˆ Training Configuration

  • Optimizer: AdamW with weight decay
  • Scheduler: Linear with warmup
  • Evaluation: Per-epoch with early stopping
  • Logging: Weights & Biases integration

🎯 Future Research & Development

πŸ”¬ Research Extensions

  • Multi-domain Adaptation: Extend beyond restaurant reviews
  • Cross-lingual Analysis: Support for multiple languages
  • Temporal Dynamics: Track sentiment trends over time
  • Aspect-based Analysis: Fine-grained sentiment aspects

πŸš€ Production Enhancements

  • Real-time Dashboard: Live analytics and monitoring
  • Custom Model Training: User-uploadable fine-tuning
  • Advanced Visualizations: Interactive charts and insights
  • Mobile Applications: iOS/Android apps

🌐 Integration Opportunities

  • Slack/Discord Bots: Team sentiment monitoring
  • Chrome Extension: Web page sentiment analysis
  • Webhook Support: Real-time notifications
  • API Rate Limiting: Enterprise-grade access control

🀝 Contributing

We welcome contributions to both research and production aspects!

πŸ”¬ Research Contributions

  • Model improvements and optimizations
  • New evaluation metrics and benchmarks
  • Dataset enhancements and preprocessing

πŸš€ Production Contributions

  • UI/UX improvements
  • API feature additions
  • Performance optimizations
  • Documentation enhancements

Contribution Process:

  1. 🍴 Fork the repository
  2. πŸ”§ Create feature branch (git checkout -b feature/AmazingFeature)
  3. πŸ’Ύ Commit changes (git commit -m 'Add AmazingFeature')
  4. πŸ“€ Push to branch (git push origin feature/AmazingFeature)
  5. πŸ”ƒ Open Pull Request

πŸ“œ Citation & Acknowledgments

πŸ“š How to Cite This Work

@misc{sentiment-analyzer-2025,
  title={Advanced Sentiment Analyzer: From Research to Production},
  author={fitsblb},
  year={2025},
  howpublished={\url{https://github.com/fitsblb/Sentiment-Analyzer}},
  note={Multi-model sentiment analysis system with consensus building}
}

πŸ™ Research Acknowledgments

  • πŸ€— Hugging Face: For the incredible transformer ecosystem and model hosting
  • πŸŽ“ DistilBERT Team: For the efficient BERT variant enabling this research
  • οΏ½ Optuna Team: For powerful hyperparameter optimization framework
  • οΏ½ Yelp: For providing the open dataset that made this research possible

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


🌟 Complete Journey: Research β†’ Production β†’ Impact

From academic research to enterprise-ready AI system

πŸ€– Original Model β€’ 🌐 Live Demo β€’ πŸ“§ Contact

Built with ❀️ and rigorous research by fitsblb

⭐ Star this repo if our research-to-production journey helped you! ⭐

About

πŸš€ Advanced Multi-Model Sentiment Analyzer - From research to production-ready AI system. Fine-tuned DistilBERT on Yelp dataset (78.5% accuracy), evolved into enterprise-grade platform with 4 AI models working in consensus (85%+ accuracy). Features modern web interface, REST APIs, batch processing,real-time analytics, and one-click cloud deployment

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published