Skip to content

A focused, AI-powered contract analysis system that extracts 17 key parameters from DOCX and PDF contracts.

Notifications You must be signed in to change notification settings

JNTdev10/Contract-Analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Contract Analysis Pipeline

A focused, AI-powered contract analysis system that extracts 17 key parameters from DOCX and PDF contracts.

πŸš€ Quick Start Guide

0. Prerequisites & Requirements

Before you start, make sure you have the following installed and configured:

Required Software:

  • Python 3.8+ - Download from python.org
  • Git and
  • PowerShell

Required APIs & Services:

  • Azure OpenAI Service - You need an active Azure subscription with OpenAI access
    • Create an Azure account at azure.microsoft.com
    • Deploy an OpenAI resource in Azure Portal
    • Get your API key and endpoint from the Azure OpenAI service

What You'll Need from Azure OpenAI:

  1. API Key - Your authentication key
  2. Endpoint URL - Your Azure OpenAI service endpoint
  3. Model Name - Usually gpt-4o-mini (recommended for cost-effectiveness)
  4. API Version - 2024-02-15-preview (current version)

Required Python Packages:

All packages are listed in requirements.txt and will be installed automatically:

1. Download & Setup

# Clone the repository
git clone https://github.com/yourusername/contract-analyzer.git
cd contract-analyzer

# Install Python dependencies
pip install -r requirements.txt

2. Configure Azure OpenAI

# Copy the environment template
copy env.example and save it as .env file

# Edit .env file with your Azure OpenAI credentials
# You need to get these from your Azure OpenAI service:
AZURE_OPENAI_API_KEY=your_actual_api_key_here
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_API_VERSION=2024-02-15-preview
AZURE_OPENAI_MODEL=gpt-4o-mini

3. Run Your First Analysis

**πŸ“ Important:** Always save your contract files in the `upload/` folder before analyzing them.

# Open PowerShell and navigate to the project folder
cd "C:\path\to\your\contract-analyzer"

# analyze the example contract
python enhanced_contract_pipeline.py "upload\Beispielvertrag.pdf"

# analyze any other contract file (save it in the upload folder first)
python enhanced_contract_pipeline.py "upload\your_contract.docx"

4. View Results

# View the database
python view_database.py view

# Export to CSV
python export_database.py

🎯 What This System Does

  1. Extracts text from DOCX and PDF contract files
  2. Analyzes with AI using Azure OpenAI to extract 17 specific parameters
  3. Stores results in an Excel database for easy management
  4. Exports data to CSV for analysis and reporting

Solution Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Contract      β”‚    β”‚   Python ETL     β”‚    β”‚   Excel         β”‚    β”‚   Business      β”‚
β”‚   Files          ──▢    Pipeline         ──▢    Database       ──▢    Intelligence   
β”‚                 β”‚    β”‚                  β”‚    β”‚                 β”‚    β”‚                 β”‚
β”‚ β€’ DOCX/PDF      β”‚    β”‚ β€’ Azure OpenAI   β”‚    β”‚ β€’ 17 Parameters β”‚    β”‚ β€’ CSV Export    β”‚
β”‚ β€’ German/EN     β”‚    β”‚ β€’ Text Extract   β”‚    β”‚ β€’ Audit Trail   β”‚    β”‚ β€’ Analysis      β”‚
β”‚ β€’ Multi-format  β”‚    β”‚ β€’ AI Analysis    β”‚    β”‚ β€’ Structured    β”‚    β”‚ β€’ Reporting     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚                        β”‚
                                β–Ό                        β–Ό
                       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                       β”‚   Text Storage   β”‚    β”‚   Parameter         β”‚
                       β”‚   (Extracted)    β”‚    β”‚   Tracking          β”‚
                       β”‚                  β”‚    β”‚                     β”‚
                       β”‚ β€’ Raw text       β”‚    β”‚ β€’ Confidence scores β”‚
                       β”‚ β€’ Structured     β”‚    β”‚ β€’ Extraction logs   β”‚
                       β”‚ β€’ Cleaned        β”‚    β”‚ β€’ Error handling    β”‚
                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“‹ The 17 Parameters Extracted

  1. Contract ID - Auto-generated unique identifier
  2. File Name - Original filename
  3. File Path - Full file path
  4. Contract Party 1 - First contracting party
  5. Contract Party 2 - Second contracting party
  6. Contract Type - Type of contract (e.g., Framework Contract)
  7. Contract Name - Official contract name/title
  8. Short Description - Brief description of contract purpose
  9. Contract Number - Contract reference number
  10. Contact Person 1 - Contact for first party
  11. Contact Person 2 - Contact for second party
  12. Start Date - Contract start date
  13. End Date - Contract end date
  14. Duration in Months - Calculated contract duration
  15. Services and Prices - Detailed services and pricing
  16. Potential Revenue - Estimated contract value
  17. Termination - Termination clauses and notice periods
  18. Liabilities - Liability limitations and caps
  19. Extract Date - Date of analysis

πŸš€ Quick Start

1. Setup

# Install dependencies
pip install -r requirements.txt

# Set up Azure OpenAI credentials in .env file
AZURE_OPENAI_API_KEY=your_key_here
AZURE_OPENAI_ENDPOINT=your_endpoint_here
AZURE_OPENAI_API_VERSION=2024-02-15-preview
AZURE_OPENAI_MODEL=gpt-4o-mini

2. Process a Contract

# Process any contract file
python contract_pipeline.py "path/to/your/contract.docx"
python contract_pipeline.py "path/to/your/contract.pdf"

3. View Results

# View database contents
python view_database.py view

# Export to CSV
python export_database.py

πŸ“ File Structure

Contract Analyzer/
β”œβ”€β”€ contract_pipeline.py          # Main pipeline script
β”œβ”€β”€ view_database.py              # Database viewer
β”œβ”€β”€ export_database.py            # CSV export utility
β”œβ”€β”€ requirements.txt              # Python dependencies
β”œβ”€β”€ contract_database.xlsx        # Excel database
β”œβ”€β”€ upload/                       # Place contracts here
β”‚   β”œβ”€β”€ contract1.docx
β”‚   └── contract2.pdf
└── src/
    β”œβ”€β”€ agents/
    β”‚   └── contract_parameter_analyzer.py  # AI analysis agent
    β”œβ”€β”€ utils/
    β”‚   β”œβ”€β”€ azure_client.py                 # Azure OpenAI client
    β”‚   β”œβ”€β”€ contract_database.py            # Excel database manager
    β”‚   β”œβ”€β”€ ingestion.py                    # DOCX/PDF text extraction
    β”‚   └── structured_extractor.py         # Initial data extraction
    └── models/
        └── base_models.py                  # Basic data models

πŸ”§ How It Works

  1. Text Extraction: Reads text from DOCX/PDF files
  2. Structured Extraction: Uses regex patterns to identify key sections
  3. AI Analysis: Azure OpenAI analyzes the contract and extracts 17 parameters
  4. Database Storage: Results stored in Excel with full audit trail
  5. Export: Data can be exported to CSV for analysis

πŸ€– AI Analysis Agent Process

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        AI CONTRACT ANALYSIS AGENT                           β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚   INPUT: Contract Text + Structured Data                                    β”‚
β”‚     β”œβ”€ Raw contract text (DOCX/PDF extracted)                               β”‚
β”‚     └─ Pre-extracted structured data (regex patterns)                       β”‚
β”‚                                                                             β”‚
β”‚   AI PROCESSING (Azure OpenAI GPT-4o-mini):                                 β”‚
β”‚     β”œβ”€ System Prompt: "Legal contract analysis expert"                      β”‚
β”‚     β”œβ”€ Temperature: 0.1 (low for consistency)                               β”‚
β”‚     β”œβ”€ Max Tokens: 2000                                                     β”‚
β”‚     └─ JSON Response Format Required                                        β”‚
β”‚                                                                             β”‚
β”‚   PARAMETER EXTRACTION (17 Key Parameters):                                 β”‚
β”‚     β”œβ”€ Contract Parties (2)                                                 β”‚
β”‚     β”œβ”€ Contract Details (Type, Name, Number, Description)                   β”‚
β”‚     β”œβ”€ Contact Persons (2)                                                  β”‚
β”‚     β”œβ”€ Dates (Start, End, Duration)                                         β”‚
β”‚     β”œβ”€ Financial (Services, Prices, Revenue)                                β”‚
β”‚     β”œβ”€ Legal (Termination, Liabilities)                                     β”‚
β”‚     └─ Metadata (Extract Date, Confidence Scores)                           β”‚
β”‚                                                                             β”‚
β”‚   VALIDATION & FALLBACK:                                                    β”‚
β”‚     β”œβ”€ Primary: AI-extracted values                                         β”‚
β”‚     β”œβ”€ Fallback: Structured data extraction                                 β”‚
β”‚     β”œβ”€ Confidence scoring per parameter                                     β”‚
β”‚     └─ Error handling with graceful degradation                             β”‚
β”‚                                                                             β”‚
β”‚   OUTPUT: Structured JSON with 17 parameters + confidence scores            β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“Š Example Output

Contract Summary:
- Contract ID: CONTRACT_Rahmenvertrag_SGS_Sa_20251016_123548
- Contract Type: Framework Contract
- Contract Name: Rahmenvertrag
- Party 1: SGS Germany GmbH
- Party 2: Samsung Electronics GmbH
- Start Date: 2025-01-01
- End Date: 2028-01-01
- Duration: 36 months
- Potential Revenue: Variable based on individual orders
- Confidence Score: 0.63
- Parameters Extracted: 12/19

πŸ› οΈ Requirements

  • Python 3.8+
  • Azure OpenAI API access
  • Required packages listed in requirements.txt

πŸ“ˆ Performance

  • Processing Time: ~5-6 seconds per contract
  • Success Rate: High accuracy with AI analysis
  • File Support: DOCX, PDF, TXT
  • Languages: Works with German and English contracts

πŸ” Database Management

The system creates an Excel database (contract_database.xlsx) with:

  • Contracts Sheet: All contract data with 17 parameters
  • Parameters Sheet: Detailed parameter tracking with confidence scores

Use view_database.py to browse contracts and export_database.py to create CSV exports for analysis.

πŸ”’ Security & Privacy

βœ… Safe to Upload Publicly

  • No hardcoded API keys or secrets in the code
  • All credentials are stored in .env file (excluded from git)
  • Only placeholder values in example files

⚠️ Important Security Notes

  • Never commit your .env file - it contains your Azure OpenAI API key
  • Keep your API keys secure - they provide access to your Azure OpenAI service
  • Monitor your API usage - Azure OpenAI charges per token usage
  • Review extracted data - contracts may contain sensitive information

πŸ›‘οΈ Best Practices

  • Use environment variables for all sensitive data
  • Regularly rotate your API keys
  • Monitor your Azure OpenAI usage and costs
  • Consider data privacy regulations (GDPR, etc.) when processing contracts

About

A focused, AI-powered contract analysis system that extracts 17 key parameters from DOCX and PDF contracts.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages