Skip to content

coffeedrunkpanda/alpha-crunch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ¦πŸ€– AlphaCrunch: Automated Investment Analyst Agent

AlphaCrunch is a LangGraph-powered conversational AI agent using a QLoRA fine-tuned Mistral-7B-Instruct-v0.2 Finance LLM, trained on virattt/financial-qa-10K SEC Q&A data. It powers RAG (ChromaDB) analysis of S&P 500 10-K filings from the jlohding/sp500-edgar-10k with multi-turn memory and Gradio UI.

AlphaCrunch Demo

πŸ“ Repository Structure

β”œβ”€β”€ .env.sample              # Env vars (Modal token, paths)
β”œβ”€β”€ .gitignore
β”œβ”€β”€ LICENSE
β”œβ”€β”€ pyproject.toml           # UV deps
β”œβ”€β”€ uv.lock
β”œβ”€β”€ README.md                # This file
β”œβ”€β”€ requirements.txt         # Legacy pip
β”œβ”€β”€ data/                    # Chroma_db, fiqa, etc
β”œβ”€β”€ notebooks/               # EDA, experiments
β”œβ”€β”€ outputs/                 # LoRA adapter, checkpoints
β”‚   └── finance-llm-adapter/
β”œβ”€β”€ scripts/                 # ingest_data.py, test_agent.py
β”œβ”€β”€ src/alpha_crunch/        # Core package (see sub-structure)
β”œβ”€β”€ .venv/
└── wandb/                   # W&B runs (LoRA eval)

Core Features

Finance LLM (QLoRA Fine-Tuning)

Fine-tuned on 70% RAG-simulated (context + question β†’ answer) and 30% pure knowledge format; 80/10/10 train/val/test split.

Performed local QLoRa fine-tuning on the Mistral 7B Intruct v0.2 model and deployed to modal, accessed through API (FastAPI).

Key Params Value Reason
LoRA rank r=16, alpha=32 Capacity/VRAM balance (fits RTX 5070 Ti)
Quant 4-bit NF4 + double quant 16GB VRAM
Targets q,k,v,o,gate,up,down Near full fine-tune quality
Template Mistral [INST] chat Instruction-tuned base

Adapter (~30MB) saved in WandB; eval with BERTScore, LLM-as-Judge, also logged to W&B.

To access the dashboard of the fine-tuning and eval:

RAG

  • Data pipeline: Top 50 S&P by weight (Mar 2026 Slickcharts), CIK-filtered from jlohding/sp500-edgar-10k, 47 unique names post-deduplication.
  • Deterministic NER with regex on immutable COMPANY_REGISTRY tuple (500+ S&P names), longest-first matching, and alias resolution (e.g., "Google" β†’ "ALPHABET").
  • Hybrid RAG: ChromaDB filters by exact company metadata before semantic search with all-mpnet-base-v2 embeddings; @lru_cache singleton optimizations.

Agent

  • LangGraph flow: intent_node β†’ conditional (rag/analyst/help) β†’ Mistral Finance LLM; MemorySaver for sessions.
  • Multi-turn memory via LangGraph add_messages reducer and MemorySaver checkpointer with thread_id sessions.
  • Intent routing: "rag", "analyst", "help" categories; state hygiene clears retrieved_context to prevent leakage.

UI

  • Gradio 6.0 UI: Streaming, custom Anta/Courier fonts, mesh gradient CSS, glassmorphism.

Architecture

Gradio UI β†’ LangGraph (AgentState: messages, intent, context, answer)
           ↓ intent_node (Finance LLM)
      β”Œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”
  RAG    Analyst   Help
(Chroma) (Reason) (Info)
         ↓ Answer
src/alpha_crunch/
  agent/
    config.py     # Immutable registry, paths, aliases
    vector_store.py # ChromaDB singleton
    rag_node.py   # LangGraph RAG execution
  state.py       # Pydantic AgentState

User input flows: Gradio ChatInterface β†’ LangGraph graph (intent_node β†’ conditional β†’ llm_node) β†’ Mistral 7B on Modal. gradio

Quickstart

  1. Clone and install with uv:
    git clone git@github.com:coffeedrunkpanda/alpha-crunch.git
    cd alphacrunch
    uv sync

Copy .env.sample β†’ .env (add Modal API key).

  1. Ingest top 50 S&P 500 filings (CIK-filtered, name-standardized):

    uv run scripts/rag/ingest_data.py
  2. Launch Gradio UI:

     uv run python src/alpha_crunch/app.py

    Features custom CSS (loomy mesh gradient, Anta/Courier Prime fonts, glassmorphism). gradio

Example Usage

Query: "What are Apple's main supply chain risks?"
Answer: Apple's supply chain risks include disruptions in manufacturing or logistics, sole-sourcing reliance on certain vendors for critical components, and foreign currency exchange rate fluctuations. These risks could materially affect the Company’s financial condition and operating results.

Sample questions:

  • what is asset allocation?

  • what is the sec fillings 10k? why does it matter?

  • what are the most important concepts in investment?

  • What are Apple's main supply chain risks?

  • Describe tesla's business.

Limitations & Next

  • Single company/query onlyβ€”no multi-compare yet.
  • Add more metadata filtering to improve the accuracy of the agent.
  • No information on stock prices. Next: yfinance integration.

Tech Stack

  • LangGraph for stateful agents with custom nodes and routing.

  • ChromaDB local vector store; uv package management; python-dotenv.

  • QLoRA-tuned Mistral-7B-Instruct (finance-specialized on SEC Q&A); Gradio ChatInterface.

  • Project Manegement with Linear

Interesting Links

LoRA vs. QLoRA - Pranav Patel

LoRA vs. QLoRA - Redhat

Instruction Tuning

About

πŸ’ŽAlphaCrunch: LangGraph-powered Finance AI Agent Conversational analyst using QLoRA-fine-tuned Mistral-7B + RAG over S&P 500 10-Ks . Multi-turn memory, Gradio UI.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors