A focused, AI-powered contract analysis system that extracts 17 key parameters from DOCX and PDF contracts.
Before you start, make sure you have the following installed and configured:
- Python 3.8+ - Download from python.org
- Git and
- PowerShell
- Azure OpenAI Service - You need an active Azure subscription with OpenAI access
- Create an Azure account at azure.microsoft.com
- Deploy an OpenAI resource in Azure Portal
- Get your API key and endpoint from the Azure OpenAI service
- API Key - Your authentication key
- Endpoint URL - Your Azure OpenAI service endpoint
- Model Name - Usually
gpt-4o-mini(recommended for cost-effectiveness) - API Version -
2024-02-15-preview(current version)
All packages are listed in requirements.txt and will be installed automatically:
# Clone the repository
git clone https://github.com/yourusername/contract-analyzer.git
cd contract-analyzer
# Install Python dependencies
pip install -r requirements.txt# Copy the environment template
copy env.example and save it as .env file
# Edit .env file with your Azure OpenAI credentials
# You need to get these from your Azure OpenAI service:
AZURE_OPENAI_API_KEY=your_actual_api_key_here
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_API_VERSION=2024-02-15-preview
AZURE_OPENAI_MODEL=gpt-4o-mini**π Important:** Always save your contract files in the `upload/` folder before analyzing them.
# Open PowerShell and navigate to the project folder
cd "C:\path\to\your\contract-analyzer"
# analyze the example contract
python enhanced_contract_pipeline.py "upload\Beispielvertrag.pdf"
# analyze any other contract file (save it in the upload folder first)
python enhanced_contract_pipeline.py "upload\your_contract.docx"# View the database
python view_database.py view
# Export to CSV
python export_database.py- Extracts text from DOCX and PDF contract files
- Analyzes with AI using Azure OpenAI to extract 17 specific parameters
- Stores results in an Excel database for easy management
- Exports data to CSV for analysis and reporting
Solution Architecture
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Contract β β Python ETL β β Excel β β Business β
β Files βββΆ Pipeline βββΆ Database βββΆ Intelligence
β β β β β β β β
β β’ DOCX/PDF β β β’ Azure OpenAI β β β’ 17 Parameters β β β’ CSV Export β
β β’ German/EN β β β’ Text Extract β β β’ Audit Trail β β β’ Analysis β
β β’ Multi-format β β β’ AI Analysis β β β’ Structured β β β’ Reporting β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β
βΌ βΌ
ββββββββββββββββββββ βββββββββββββββββββββββ
β Text Storage β β Parameter β
β (Extracted) β β Tracking β
β β β β
β β’ Raw text β β β’ Confidence scores β
β β’ Structured β β β’ Extraction logs β
β β’ Cleaned β β β’ Error handling β
ββββββββββββββββββββ βββββββββββββββββββββββ
- Contract ID - Auto-generated unique identifier
- File Name - Original filename
- File Path - Full file path
- Contract Party 1 - First contracting party
- Contract Party 2 - Second contracting party
- Contract Type - Type of contract (e.g., Framework Contract)
- Contract Name - Official contract name/title
- Short Description - Brief description of contract purpose
- Contract Number - Contract reference number
- Contact Person 1 - Contact for first party
- Contact Person 2 - Contact for second party
- Start Date - Contract start date
- End Date - Contract end date
- Duration in Months - Calculated contract duration
- Services and Prices - Detailed services and pricing
- Potential Revenue - Estimated contract value
- Termination - Termination clauses and notice periods
- Liabilities - Liability limitations and caps
- Extract Date - Date of analysis
# Install dependencies
pip install -r requirements.txt
# Set up Azure OpenAI credentials in .env file
AZURE_OPENAI_API_KEY=your_key_here
AZURE_OPENAI_ENDPOINT=your_endpoint_here
AZURE_OPENAI_API_VERSION=2024-02-15-preview
AZURE_OPENAI_MODEL=gpt-4o-mini# Process any contract file
python contract_pipeline.py "path/to/your/contract.docx"
python contract_pipeline.py "path/to/your/contract.pdf"# View database contents
python view_database.py view
# Export to CSV
python export_database.pyContract Analyzer/
βββ contract_pipeline.py # Main pipeline script
βββ view_database.py # Database viewer
βββ export_database.py # CSV export utility
βββ requirements.txt # Python dependencies
βββ contract_database.xlsx # Excel database
βββ upload/ # Place contracts here
β βββ contract1.docx
β βββ contract2.pdf
βββ src/
βββ agents/
β βββ contract_parameter_analyzer.py # AI analysis agent
βββ utils/
β βββ azure_client.py # Azure OpenAI client
β βββ contract_database.py # Excel database manager
β βββ ingestion.py # DOCX/PDF text extraction
β βββ structured_extractor.py # Initial data extraction
βββ models/
βββ base_models.py # Basic data models
- Text Extraction: Reads text from DOCX/PDF files
- Structured Extraction: Uses regex patterns to identify key sections
- AI Analysis: Azure OpenAI analyzes the contract and extracts 17 parameters
- Database Storage: Results stored in Excel with full audit trail
- Export: Data can be exported to CSV for analysis
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AI CONTRACT ANALYSIS AGENT β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β INPUT: Contract Text + Structured Data β
β ββ Raw contract text (DOCX/PDF extracted) β
β ββ Pre-extracted structured data (regex patterns) β
β β
β AI PROCESSING (Azure OpenAI GPT-4o-mini): β
β ββ System Prompt: "Legal contract analysis expert" β
β ββ Temperature: 0.1 (low for consistency) β
β ββ Max Tokens: 2000 β
β ββ JSON Response Format Required β
β β
β PARAMETER EXTRACTION (17 Key Parameters): β
β ββ Contract Parties (2) β
β ββ Contract Details (Type, Name, Number, Description) β
β ββ Contact Persons (2) β
β ββ Dates (Start, End, Duration) β
β ββ Financial (Services, Prices, Revenue) β
β ββ Legal (Termination, Liabilities) β
β ββ Metadata (Extract Date, Confidence Scores) β
β β
β VALIDATION & FALLBACK: β
β ββ Primary: AI-extracted values β
β ββ Fallback: Structured data extraction β
β ββ Confidence scoring per parameter β
β ββ Error handling with graceful degradation β
β β
β OUTPUT: Structured JSON with 17 parameters + confidence scores β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Contract Summary:
- Contract ID: CONTRACT_Rahmenvertrag_SGS_Sa_20251016_123548
- Contract Type: Framework Contract
- Contract Name: Rahmenvertrag
- Party 1: SGS Germany GmbH
- Party 2: Samsung Electronics GmbH
- Start Date: 2025-01-01
- End Date: 2028-01-01
- Duration: 36 months
- Potential Revenue: Variable based on individual orders
- Confidence Score: 0.63
- Parameters Extracted: 12/19
- Python 3.8+
- Azure OpenAI API access
- Required packages listed in
requirements.txt
- Processing Time: ~5-6 seconds per contract
- Success Rate: High accuracy with AI analysis
- File Support: DOCX, PDF, TXT
- Languages: Works with German and English contracts
The system creates an Excel database (contract_database.xlsx) with:
- Contracts Sheet: All contract data with 17 parameters
- Parameters Sheet: Detailed parameter tracking with confidence scores
Use view_database.py to browse contracts and export_database.py to create CSV exports for analysis.
- No hardcoded API keys or secrets in the code
- All credentials are stored in
.envfile (excluded from git) - Only placeholder values in example files
- Never commit your
.envfile - it contains your Azure OpenAI API key - Keep your API keys secure - they provide access to your Azure OpenAI service
- Monitor your API usage - Azure OpenAI charges per token usage
- Review extracted data - contracts may contain sensitive information
- Use environment variables for all sensitive data
- Regularly rotate your API keys
- Monitor your Azure OpenAI usage and costs
- Consider data privacy regulations (GDPR, etc.) when processing contracts