Skip to content

fintihlupik/LLM_project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

144 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

fondo README

LLM project

Financial agent for content generators β€’ Dockerized β€’ FastAPI + Gradio + Chroma DB


🧭 Table of Contents


πŸ“Œ Project Overview

FinancIA is project is a modular financial AI assistant backend built with FastAPI and Gradio. It leverages a Chroma vector database for efficient semantic search and integrates multiple APIs, including Groq for language modeling and Stability AI for image generation. The system features specialized agents handling stock market data retrieval, academic research via arXiv and PDF processing, and general knowledge queries.

Content generation adapts to multilingual and cultural nuances using libraries like spaCy, KeyBERT, Rake, DeepTranslator, and langdetect. The architecture is designed for extensibility, supporting financial educational content tailored by platform, language, and user demographics. Containerized with Docker, it separates frontend and backend deployments for scalability and maintainability.

The platform supports:

  • πŸ“ Dynamic Content Generation

    Generate tailored educational content focused on finance for multiple social media platforms such as Instagram, Twitter, and LinkedIn. Content adapts by age group, language, regional dialects, and platform behavior to maximize relevance and engagement.

  • πŸ’Ό Specialized Financial Agent

    An agent with four distinct strategies (Yahoo Finance, standard queries, simple RAG, and combined approach) delivering expert-level financial insights and contextual reasoning.

  • πŸ“š Retrieval-Augmented Generation (RAG) with Academic Focus

    Perform deep, context-aware responses by retrieving and indexing economic research papers (PDFs from arXiv) in Chroma. Uses keyword extraction (KeyBERT, Rake, spaCy), translation (DeepTranslator), and LM prompting for rich, grounded answers.

  • πŸ“ˆ Real-Time Financial Data Analysis

    Integrates Yahoo Finance API to extract detailed technical data, company summaries, and links for stocks and indices, enabling timely and accurate financial analysis

  • 🎨 AI-Powered Image Creation

    Converts text prompts into photorealistic images using Stability AI, enhancing visual storytelling for financial content.

  • 🎯 Audience Segmentation

    Tailor content to specific demographics β€” including age group, language, and geographical region β€” to maximize relevance and engagement.

  • 🌎 Multilingual and Culturally Aware Content Segmentation

Supports dynamic content adaptation for multiple languages and cultural nuances, improving user engagement across demographics.


πŸ“Ž Useful links

  • Medium Article

https://medium.com/@yaelparrac/engineering-an-ai-that-thinks-in-economics-narrates-in-markets-and-communicates-in-human-context-4ca0180ca8f0

  • Documentation

https://deepwiki.com/Yael-Parra/Finance_LLM_Project

  • Presentation

🎯 Target Audience

  • Content creators
  • Community manager
  • Social media manager

πŸ”§ General Project Overview

Features

  • FastAPI-based backend with modular structure
  • Gradio frontend for demo/testing
  • Multi-agent design for finance-related tasks
  • ChromaDB for local vector search
  • Prompt engineering tailored to platform, age, language, and cultural context
  • Fully containerized (backend/frontend split)
  • NLP pipeline using KeyBERT, RAKE, spaCy, and langdetect
  • Multilingual translation via DeepTranslator (Google)
  • Stable Diffusion API integration for realistic image generation
βœ… Pros ❌ Cons
Modular and clean backend design No persistent Chroma store (in-memory or local, not shared)
Covers diverse use cases: stock data, research, content, images No UI interactivity beyond Gradio input box
Agents are task-specific and clearly isolated No authentication, session tracking, or user history
Uses lightweight and fast vector store (Chroma) Manual setup for spaCy/NLTK models (not auto-downloaded on build)
Multilingual and culturally adaptive content generation No monitoring, logging, or rate limiting

🌐 Routes & Agents

Features

  • /query: GROQ LM general-purpose agent
  • /yahoo: Market data with historical and technical info from Yahoo Finance
  • /agent: RAG search from Chroma (local knowledge base)
  • /content: Academic agent using Arxiv and document-based RAG
  • /image: Realistic image generation via Stability API
βœ… Pros ❌ Cons
Well-separated routes and logic per use case GROQ agent is single-shot, lacks conversation memory
Custom agent strategies: Yahoo, RAG, RAG+Arxiv, GROQ No centralized prompt logging or feedback loop
Combined keyphrase extraction and translation pipeline for better recall Agents are manually selected; not dynamically adaptive
Arxiv-based academic search supports real PDF embedding and indexing) Image generation has no style/theme configuration)

🧠 Architecture & Services

Features

  • services/: Core business logic, prompt construction, LM connectors
  • tools/: Scripts and helpers for PDF fetching, Yahoo parsing, post generation
  • utils/query_depth: Automatically switches between shallow and deep agents
  • prompts/: Modular prompt base by platform, age, language, region
  • agents/finance_agent.py: Unified agent with strategy dispatching
βœ… Pros ❌ Cons
Clear separation between tools, services, and routes Prompt design is heuristic; not optimized via real user feedback
Prompt builder is extensible and adaptable No lifecycle management of vector store data (aging, deduplication, tagging)
Query depth analysis introduces contextual strategy control Static agent strategy selection; lacks metrics or LLM guidance
Prompt modularity enables wide reuse across agents

πŸš€ Future Features & Implementations

Planned Improvements

  • Add user-facing frontend with search history, filters, and summaries
  • Implement agent registry system for pluggable strategy injection
  • Add monitoring, logging, and error tracing for observability
  • Switch to persistent Chroma (Postgres or S3-backed)
  • Integrate memory or context chaining for conversational agents
  • Use semantic scoring or retrieval confidence to refine RAG
  • Automate download and build of required NLP models (spaCy/NLTK)
  • Enable image generation prompts with template presets or styles
  • Add multilingual UI for internationalization support
  • Create scheduled background tasks (e.g. RSS feeds, economic updates)

πŸ› οΈ Tools & Technologies

βš™οΈ Backend

FastAPI Pydantic Uvicorn Starlette

  • Python async stack: aiohttp, httpx, anyio, async-timeout
  • API routing & services: FastAPI, Starlette, Uvicorn
  • Data validation & configuration: pydantic, pydantic-settings, dataclasses-json

πŸ€– LLM & NLP Stack

LangChain SpaCy Transformers HuggingFace

  • LangChain ecosystem: langchain, langchain-core, langchain-community, langchain-chroma, langchain-text-splitters, langchain-ollama, langchain-groq
  • Multilingual models: spaCy, en-core-web-sm, es-core-news-sm, fr-core-news-sm
  • Embedding & tokenization: transformers, tiktoken, tokenizers, sentence-transformers, sentencepiece
  • NLP utilities: nltk, deep-translator, langdetect, keybert, rake-nltk

πŸ“¦ Vector Database & Data Processing

ChromaDB Pandas SQLite

  • Vector storage: chromadb, pypika, peewee
  • Data manipulation: pandas, numpy, scikit-learn, scipy
  • PDF and document processing: pypdf, rake-nltk, pdf_fetcher

🌐 Web UI & Interfaces

Gradio

  • Frontend interface: gradio, gradio-client
  • Async communication: httpx-sse, safehttpx

🐳 DevOps & Utilities

Docker GitHub Actions Dotenv

  • Containers & orchestration: docker-compose, Dockerfile.backend, Dockerfile.frontend
  • Environment management: .env, python-dotenv
  • Logging & monitoring: loguru, coloredlogs, opentelemetry-*, posthog

πŸ§ͺ Model Architecture

System Architecture Diagram

πŸ“ Project Structure

πŸ“¦ LLM_PROJECT  
β”œβ”€β”€ πŸ“ client                  # Gradio frontend interface  
β”œβ”€β”€ πŸ“ data                  
β”‚   └── πŸ“ chroma_db          # Vector database storage  
β”œβ”€β”€ πŸ“ server                 # Backend FastAPI application  
β”‚   β”œβ”€β”€ πŸ“ agents             # Agent implementations (e.g., financial)  
β”‚   β”œβ”€β”€ πŸ“ database           # DB management scripts and raw data  
β”‚   β”œβ”€β”€ πŸ“ prompts            # Prompt templates organized by domain  
β”‚   β”œβ”€β”€ πŸ“ routes             # API route definitions  
β”‚   β”œβ”€β”€ πŸ“ services           # Core LLM and prompt handling logic  
β”‚   β”œβ”€β”€ πŸ“ tools              # PDF/image/Yahoo tools  
β”‚   └── πŸ“ utils              # Helper utilities   
β”‚   β”œβ”€β”€ main.py                # Entrypoint to FastAPI app  
β”‚
β”œβ”€β”€ README.md                # Project overview  
β”œβ”€β”€ requirements.txt         # Python dependencies  
β”œβ”€β”€ .env                     # Environment variables (not tracked by Git)  
β”œβ”€β”€ .gitignore               # Git ignore rules  
β”œβ”€β”€ .dockerignore            # Docker ignore rules  
β”œβ”€β”€ docker-compose.yml       # Docker orchestration  
β”œβ”€β”€ Dockerfile.backend       # Docker build config for backend  
β”œβ”€β”€ Dockerfile.frontend      # Docker build config for frontend  


✍ Deployment Instructions

πŸ“‹ Prerequisites

Before you begin, make sure you have:

Python 3.10
Docker Desktop

πŸ§ͺ 1. Clone the repository

git clone https://github.com/your-username/your-repo.git
cd your-repo

πŸ” 2. Configure environment variables

Create a .env file in the project root and add your variables (e.g.):

# Groq API Key for LLM access
GROQ_API_KEY="YOUR_GROQ_API_KEY"

# Stability AI API Key for image generation
STABILITY_API_KEY="YOUR_STABILITY_API_KEY"

πŸ“¦ 3. Docker

# Open Desktop Docker
docker compose build
docker compose up

# To check if everything is going well
Docker ps

# To access to the front and back, you only have to click on the links you will see on the terminal

πŸ‘©β€πŸ’» Contributors

We are AI students with a heart and passion for building better solutions for real problems. Feel free to explore, fork, or connect with us for ideas, feedback, or collaborations.

Name GitHub LinkedIn
Yael Parra GitHub LinkedIn
Polina Terekhova Pavlova GitHub LinkedIn
Mariela Adimari GitHub LinkedIn
Abigail Masapanta Romero GitHub LinkedIn

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages