Skip to content

AdityaC-07/project-aether

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Project AETHER

Python FastAPI React Vite Google Gemini Vertex AI

Coordinator-driven multi-agent AI system for structured debate, opposition, and synthesis over a normalized reasoning context.

The system extracts debatable factors, argues for and against them using independent agents, and synthesizes a transparent final report β€” all orchestrated deterministically.


Features

Backend (Python/FastAPI)

  • Multi-Agent Orchestration: FactorExtractor, Support, Opposition, and Synthesizer agents working in sequence
  • PDF Processing:
    • Text extraction from PDFs
    • Table extraction and parsing (numeric values β†’ metrics)
    • Metadata extraction
  • Structured Debate System: Automatic pro/con analysis for each identified factor
  • Reasoning Context Management: Unified data model for facts, metrics, assumptions, and limitations
  • JSON Logging: All analysis sessions logged with full reasoning trace
  • PDF Report Generation: Beautiful formatted PDF reports with embedded analysis

Frontend (React/Vite)

  • Interactive UI: Components for uploading PDFs, entering factors, and viewing results
  • Real-time Analysis: Direct integration with backend API
  • Responsive Design: Mobile-friendly interface
  • Factor Management: Input custom factors with domain tagging
  • Results Display: Visualized debate logs and synthesis

Tech Stack

  • Backend: Python 3.10+, FastAPI, Pydantic v2, Gemini via Vertex AI (google-genai)
  • Frontend: React 19+, Vite, CSS
  • Data Processing: PyPDF2, Camelot (table extraction)
  • Async: async/await architecture
  • Logging: Structured JSON logging
  • PDF Generation: ReportLab

Architecture Overview

Request (via API or PDF Upload)
↓
ReasoningContext (validated)
↓
FactorExtractorAgent β†’ Extract debatable factors + domain
↓
SupportAgent β†’ Generate pro arguments for each factor
↓
OppositionAgent β†’ Generate counter arguments
↓
SynthesizerAgent β†’ Combine and synthesize findings
↓
Final Structured Report + Debate Logs
↓
Optional: Generate PDF Report

Key Properties:

  • Agents never call each other directly
  • Orchestrator enforces sequence deterministically
  • No agent invents facts beyond the provided context
  • All outputs are strict JSON schemas
  • Table parsing is optional and never crashes the pipeline

Setup

1) Create and activate a virtual environment (Windows PowerShell)

python -m venv .venv
.\.venv\Scripts\Activate.ps1

2) Install dependencies

pip install -r requirements.txt

Note: Camelot table extraction requires optional system libraries:

  • On Windows, it should work out of the box
  • On macOS/Linux, you may need graphviz installed for best compatibility

3) Configure environment variables

Create a .env file in the project root or backend/ (Vertex AI via ADC):

GCP_PROJECT=YOUR_GCP_PROJECT_ID
GCP_LOCATION=us-central1
GEMINI_MODEL=gemini-2.5-pro

⚠️ .env is git-ignored and must not be committed.

Environment variables are loaded automatically using python-dotenv.

Make sure ADC is configured (for example, gcloud auth application-default login or a service account).


Run the Backend API

cd backend
uvicorn app.main:app --host 0.0.0.0 --port 8000

API root: πŸ‘‰ http://localhost:8000/


Run the Frontend

cd frontend
npm install
npm run dev

Frontend: πŸ‘‰ http://localhost:5173/

Optional frontend API base override (create frontend/.env):

VITE_API_BASE=http://localhost:8000

API Endpoints

POST /analyze

Analyze structured reasoning context with debate and synthesis.

Request Body (JSON)

{
  "narrative": "Main report text",
  "extracted_facts": [
    "Customer engagement increased in metro cities during Q3",
    "Tier-2 cities experienced higher churn rates"
  ],
  "metrics": [
    {
      "name": "conversion_rate",
      "region": "metro",
      "value": 3.4
    }
  ],
  "assumptions": ["Higher engagement generally leads to higher revenue"],
  "limitations": ["Customer demographics were not segmented"]
}

Response Body (JSON)

{
  "final_report": {
    "what_worked": "...",
    "what_failed": "...",
    "why_it_happened": "...",
    "how_to_improve": "...",
    "synthesis": "...",
    "recommendation": "...",
    "confidence_score": 85
  },
  "factors": [
    {
      "factor_id": "F1",
      "description": "...",
      "domain": "sales"
    }
  ],
  "debate_logs": [
    {
      "factor_id": "F1",
      "factor": {
        "factor_id": "F1",
        "description": "...",
        "domain": "sales"
      },
      "support": {
        "support_arguments": [
          {
            "claim": "...",
            "evidence": "...",
            "assumption": "..."
          }
        ]
      },
      "opposition": {
        "counter_arguments": [
          {
            "target_claim": "...",
            "challenge": "...",
            "risk": "..."
          }
        ]
      }
    }
  ]
}

POST /analyze-pdf

Upload and analyze a PDF document.

  • Extracts text from all pages
  • Extracts tables (converts numeric values to metrics)
  • Returns analysis results in same format as /analyze

POST /analyze-report

Analyze structured context and return PDF report.

Returns a beautifully formatted PDF with:

  • Executive summary from synthesis
  • Extracted factors with domain labels
  • Full debate logs with support/opposition arguments
  • Timestamps and confidence scores

POST /analyze-pdf-report

Upload PDF, analyze it, and return formatted PDF report.

Combines PDF extraction and report generation in one request.


GET /status

Returns the current orchestration phase and status metadata.


GET /download-report

Returns a PDF report for the most recent analysis (without re-running).


Data Models

ReasoningContext

class Metric(BaseModel):
    name: str
    region: Optional[str] = None
    value: float

class ReasoningContext(BaseModel):
    narrative: str
    extracted_facts: List[str] = []
    metrics: List[Metric] = []
    assumptions: List[str] = []
    limitations: List[str] = []

Domain Labels

Supported domains for factors:

  • sales
  • organization
  • policy
  • statistics

PDF Processing

Table Extraction

  • Uses Camelot library to extract tables from PDFs
  • Processes all pages automatically
  • First row assumed to be headers
  • First column (if present) becomes region label
  • Numeric cells converted to metrics
  • Non-numeric cells skipped
  • Errors logged but never crash the pipeline

If No Tables Found

  • Processing continues normally with text extraction only
  • Returns empty metrics list

Logging

  • All reasoning sessions are logged as structured JSON
  • Location: logs/reasoning_logs.json
  • The logs/ directory is ignored by Git
  • Includes full trace of all agent outputs and decisions

Key Design Principles

  • No hallucination β€” agents rely strictly on provided context
  • Debate-first reasoning β€” every claim is challenged
  • Deterministic flow β€” orchestrator controls execution
  • Schema-validated outputs β€” every agent returns strict JSON
  • Graceful degradation β€” optional features (table parsing) never crash
  • Transparent reasoning β€” all intermediate steps logged
  • Domain-aware β€” factors categorized by domain for better analysis

Directory Structure

project-aether/
β”œβ”€β”€ README.md
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ package.json
β”‚   β”œβ”€β”€ vite.config.js
β”‚   β”œβ”€β”€ index.html
β”‚   └── src/
β”‚       β”œβ”€β”€ main.jsx
β”‚       β”œβ”€β”€ App.jsx
β”‚       β”œβ”€β”€ App.css
β”‚       β”œβ”€β”€ index.css
β”‚       β”œβ”€β”€ components/
β”‚       β”‚   β”œβ”€β”€ PdfUpload.jsx
β”‚       β”‚   β”œβ”€β”€ FactorsList.jsx
β”‚       β”‚   β”œβ”€β”€ JsonInput.jsx
β”‚       β”‚   └── ResultsDisplay.jsx
β”‚       β”œβ”€β”€ pages/
β”‚       β”‚   β”œβ”€β”€ Home.jsx
β”‚       β”‚   └── Results.jsx
β”‚       └── services/
β”‚           └── api.js
└── backend/
    β”œβ”€β”€ requirements.txt
    β”œβ”€β”€ app/
    β”‚   β”œβ”€β”€ main.py
    β”‚   β”œβ”€β”€ orchestrator.py
    β”‚   β”œβ”€β”€ agents/
    β”‚   β”‚   β”œβ”€β”€ base_agent.py
    β”‚   β”‚   β”œβ”€β”€ factor_extractor.py
    β”‚   β”‚   β”œβ”€β”€ support_agent.py
    β”‚   β”‚   β”œβ”€β”€ opposition_agent.py
    β”‚   β”‚   └── synthesizer_agent.py
    β”‚   β”œβ”€β”€ schemas/
    β”‚   β”‚   β”œβ”€β”€ context.py
    β”‚   β”‚   β”œβ”€β”€ factor.py
    β”‚   β”‚   β”œβ”€β”€ debate.py
    β”‚   β”‚   └── final_report.py
    β”‚   β”œβ”€β”€ utils/
    β”‚   β”‚   β”œβ”€β”€ pdf_parser.py
    β”‚   β”‚   β”œβ”€β”€ pdf_generator.py
    β”‚   β”‚   β”œβ”€β”€ logger.py
    β”‚   β”‚   └── llm_client.py
    β”‚   └── prompts/
    β”‚       β”œβ”€β”€ factor_prompt.txt
    β”‚       β”œβ”€β”€ support_prompt.txt
    β”‚       β”œβ”€β”€ opposition_prompt.txt
    β”‚       └── synthesis_prompt.txt
    └── logs/
        └── reasoning_logs.json

Notes

  • The system uses Gemini via Vertex AI (google-genai SDK)
  • Billing or available quota is required for sustained usage
  • Free-tier quotas may be limited depending on project settings
  • Agents are isolated and stateless per request
  • Table parsing works with standard PDFs; complex/scanned PDFs may require OCR (not currently supported)
  • All timestamps are UTC
  • .env files must never be committed (git-ignored by default)

Future Enhancements

  • OCR support for scanned PDFs
  • Chart extraction and analysis
  • Multi-language support
  • Custom domain definitions
  • Result caching and history
  • Advanced report formatting options
  • Real-time collaborative analysis
  • Integration with more LLM providers

About

AI-powered multi-agent system for structured debate and synthesis of complex documents. Eliminates bias through adversarial reasoning - independent agents argue for/against extracted factors to generate balanced, transparent reports. Built with FastAPI, React, and Google Gemini.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors