Skip to content

gustavodemari/cvm-rag

Repository files navigation

Project

This project combines Large Language Models (LLM) with Retrieval-Augmented Generation (RAG) to efficiently answer queries about public Brazilian companies. I’ve used PostgreSQL to store structured company data, with pgvector enabling fast vector search for document retrieval.

For natural language processing, we utilized the Google Gemini Flash model and Hugging Face models to generate embeddings. User queries are handled through an interactive Gradio interface, while Polars was employed for high-performance data manipulation and analysis.

By integrating these technologies, the system retrieves relevant documents from the Securities and Exchange Commission of Brazil and generates accurate responses, making it easier to access corporate data in Brazil.

Diagrams

Preprocessing

Chatbot

Models

Install

poetry install

Running

poetry shell
python app.py

Docs

To build the docs using Sphinx, use the following commands:

cd docs/
make clean html
cd build/html
python -m http.server

Tests

poetry run pytest

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published