🏦🤖 AlphaCrunch: Automated Investment Analyst Agent

AlphaCrunch is a LangGraph-powered conversational AI agent using a QLoRA fine-tuned Mistral-7B-Instruct-v0.2 Finance LLM, trained on virattt/financial-qa-10K SEC Q&A data. It powers RAG (ChromaDB) analysis of S&P 500 10-K filings from the jlohding/sp500-edgar-10k with multi-turn memory and Gradio UI.

📁 Repository Structure

├── .env.sample              # Env vars (Modal token, paths)
├── .gitignore
├── LICENSE
├── pyproject.toml           # UV deps
├── uv.lock
├── README.md                # This file
├── requirements.txt         # Legacy pip
├── data/                    # Chroma_db, fiqa, etc
├── notebooks/               # EDA, experiments
├── outputs/                 # LoRA adapter, checkpoints
│   └── finance-llm-adapter/
├── scripts/                 # ingest_data.py, test_agent.py
├── src/alpha_crunch/        # Core package (see sub-structure)
├── .venv/
└── wandb/                   # W&B runs (LoRA eval)

Core Features

Finance LLM (QLoRA Fine-Tuning)

Fine-tuned on 70% RAG-simulated (context + question → answer) and 30% pure knowledge format; 80/10/10 train/val/test split.

Performed local QLoRa fine-tuning on the Mistral 7B Intruct v0.2 model and deployed to modal, accessed through API (FastAPI).

Key Params	Value	Reason
LoRA rank	r=16, alpha=32	Capacity/VRAM balance (fits RTX 5070 Ti)
Quant	4-bit NF4 + double quant	16GB VRAM
Targets	q,k,v,o,gate,up,down	Near full fine-tune quality
Template	Mistral [INST] chat	Instruction-tuned base

Adapter (~30MB) saved in WandB; eval with BERTScore, LLM-as-Judge, also logged to W&B.

To access the dashboard of the fine-tuning and eval:

WandB LoRA fine-tunning dashboard

WandB Evals dashboard

RAG

Data pipeline: Top 50 S&P by weight (Mar 2026 Slickcharts), CIK-filtered from jlohding/sp500-edgar-10k, 47 unique names post-deduplication.
Deterministic NER with regex on immutable COMPANY_REGISTRY tuple (500+ S&P names), longest-first matching, and alias resolution (e.g., "Google" → "ALPHABET").
Hybrid RAG: ChromaDB filters by exact company metadata before semantic search with all-mpnet-base-v2 embeddings; @lru_cache singleton optimizations.

Agent

LangGraph flow: intent_node → conditional (rag/analyst/help) → Mistral Finance LLM; MemorySaver for sessions.
Multi-turn memory via LangGraph add_messages reducer and MemorySaver checkpointer with thread_id sessions.
Intent routing: "rag", "analyst", "help" categories; state hygiene clears retrieved_context to prevent leakage.

UI

Gradio 6.0 UI: Streaming, custom Anta/Courier fonts, mesh gradient CSS, glassmorphism.

Architecture

Gradio UI → LangGraph (AgentState: messages, intent, context, answer)
           ↓ intent_node (Finance LLM)
      ┌──────┼──────┐
  RAG    Analyst   Help
(Chroma) (Reason) (Info)
         ↓ Answer

src/alpha_crunch/
  agent/
    config.py     # Immutable registry, paths, aliases
    vector_store.py # ChromaDB singleton
    rag_node.py   # LangGraph RAG execution
  state.py       # Pydantic AgentState

User input flows: Gradio ChatInterface → LangGraph graph (intent_node → conditional → llm_node) → Mistral 7B on Modal. gradio

Quickstart

Clone and install with uv:

git clone git@github.com:coffeedrunkpanda/alpha-crunch.git
cd alphacrunch
uv sync

Copy .env.sample → .env (add Modal API key).

Ingest top 50 S&P 500 filings (CIK-filtered, name-standardized):
```
uv run scripts/rag/ingest_data.py
```
Launch Gradio UI:
```
 uv run python src/alpha_crunch/app.py
```
Features custom CSS (loomy mesh gradient, Anta/Courier Prime fonts, glassmorphism). gradio

Example Usage

Query: "What are Apple's main supply chain risks?"
Answer: Apple's supply chain risks include disruptions in manufacturing or logistics, sole-sourcing reliance on certain vendors for critical components, and foreign currency exchange rate fluctuations. These risks could materially affect the Company’s financial condition and operating results.

Sample questions:

what is asset allocation?
what is the sec fillings 10k? why does it matter?
what are the most important concepts in investment?
What are Apple's main supply chain risks?
Describe tesla's business.

Limitations & Next

Single company/query only—no multi-compare yet.
Add more metadata filtering to improve the accuracy of the agent.
No information on stock prices. Next: yfinance integration.

Tech Stack

LangGraph for stateful agents with custom nodes and routing.
ChromaDB local vector store; uv package management; python-dotenv.
QLoRA-tuned Mistral-7B-Instruct (finance-specialized on SEC Q&A); Gradio ChatInterface.
Project Manegement with Linear

Interesting Links

LoRA vs. QLoRA - Pranav Patel

LoRA vs. QLoRA - Redhat

Instruction Tuning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏦🤖 AlphaCrunch: Automated Investment Analyst Agent

📁 Repository Structure

Core Features

Finance LLM (QLoRA Fine-Tuning)

RAG

Agent

UI

Architecture

Quickstart

Example Usage

Sample questions:

Limitations & Next

Tech Stack

Interesting Links

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
demo		demo
notebooks		notebooks
reports		reports
scripts		scripts
src/alpha_crunch		src/alpha_crunch
.env.sample		.env.sample
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

🏦🤖 AlphaCrunch: Automated Investment Analyst Agent

📁 Repository Structure

Core Features

Finance LLM (QLoRA Fine-Tuning)

RAG

Agent

UI

Architecture

Quickstart

Example Usage

Sample questions:

Limitations & Next

Tech Stack

Interesting Links

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages