Skip to content

gui-uxdoccom/pc-compare

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🏢 PC Compare - Portfolio Company Validation Tool

A sophisticated web application for comparing portfolio companies between your baseline Excel files and live website data with 99%+ accuracy using advanced matching algorithms.

🚀 Quick Start

# Navigate to the webview directory
cd pc-compare/webview

# Create virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Install Playwright browsers
playwright install chromium firefox

# Start the Flask application
python3 app.py

Visit http://127.0.0.1:5000 to use the web interface.

Key Features

🧠 Advanced Matching Engine

  • 99%+ Accuracy with multi-strategy matching algorithms
  • Smart normalization handles business suffixes and name variations
  • Acronym detection matches "SABIC" with "Saudi Basic Industries Corporation SABIC"
  • False positive prevention avoids incorrect matches

📊 Comprehensive Analytics

  • Real-time dashboard with key metrics and insights
  • Historical tracking with quarter-over-quarter comparison
  • Smart recommendations for data quality improvements
  • Executive-ready reports with summary statistics

🎯 Intelligent Validation

  • Multi-field matching (CR Name, Brand Name, Sector)
  • Sector intelligence with synonym mapping
  • Confidence scoring for each match
  • Issue categorization (Missing, Extra, Name/Sector Updates)

📁 Project Structure

webview/
├── app.py                      # Flask web application
├── enhanced_matching.py        # Advanced matching algorithms
├── results_analyzer.py         # Analytics and historical tracking
├── compare.py                  # Core comparison logic with scraping
├── config.py                   # Configuration settings
├── index.html                  # Web interface
├── requirements.txt            # Python dependencies
├── test_scrape_visible.py      # Debug scraper (Chrome)
├── test_scrape_firefox.py      # Debug scraper (Firefox)
├── uploads/                    # File upload directory
└── README.md                   # Complete documentation

🔧 How It Works

  1. Upload your quarterly baseline Excel file
  2. Automatic scraping of live website data using Playwright
  3. Multi-strategy matching using 5 different algorithms:
    • Exact normalized matching
    • Core name extraction
    • Acronym/substring matching
    • Token-based matching
    • Fuzzy matching with validation
  4. Analytics generation with historical comparison
  5. Download enhanced Excel report with insights

📈 Matching Accuracy

  • 99%+ overall accuracy for portfolio company validation
  • False positive prevention with length and context validation
  • Handles complex cases like business name variations and acronyms
  • Sector intelligence with synonym mapping

🎯 Perfect For

  • Portfolio Managers validating company listings quarterly
  • Investment Teams maintaining accurate company databases
  • Compliance Teams ensuring website data accuracy
  • Data Analysts tracking portfolio company changes over time

📊 Sample Results

For a typical 200-company baseline:

  • 190-198 Perfect Matches (99%+ accuracy)
  • 🔄 5-10 Name Updates needed
  • 🏷️ 2-5 Sector Updates needed
  • 0-3 Missing Companies to add to website
  • 0-2 Extra Companies to review

🛠️ Requirements

  • Python 3.8+
  • Modern web browser
  • Internet connection for website scraping

See webview/README.md for detailed technical documentation, including:

  • Complete installation instructions
  • Usage guide for web interface and CLI
  • Troubleshooting Cloudflare issues
  • Understanding matching strategies
  • Configuration options

Built with ❤️ for accurate portfolio company validation

About

Comparing PCs baseline with Website PCs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors