Project

This project combines Large Language Models (LLM) with Retrieval-Augmented Generation (RAG) to efficiently answer queries about public Brazilian companies. I’ve used PostgreSQL to store structured company data, with pgvector enabling fast vector search for document retrieval.

For natural language processing, we utilized the Google Gemini Flash model and Hugging Face models to generate embeddings. User queries are handled through an interactive Gradio interface, while Polars was employed for high-performance data manipulation and analysis.

By integrating these technologies, the system retrieves relevant documents from the Securities and Exchange Commission of Brazil and generates accurate responses, making it easier to access corporate data in Brazil.

Diagrams

Preprocessing

Chatbot

Models

Embeddings: paraphrase-multilingual-MiniLM-L12-v2
Large Language Model: gemini-1.5-flash

Install

poetry install

Running

poetry shell
python app.py

Docs

To build the docs using Sphinx, use the following commands:

cd docs/
make clean html
cd build/html
python -m http.server

Tests

poetry run pytest

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gradio		.gradio
cvm_rag		cvm_rag
data		data
docs		docs
extracted_files		extracted_files
pdfs		pdfs
tests		tests
.gitignore		.gitignore
README.md		README.md
app.py		app.py
hybrid_search.ipynb		hybrid_search.ipynb
llm-rag-diagram-chatbot.png		llm-rag-diagram-chatbot.png
llm-rag-diagram-preprocessing.png		llm-rag-diagram-preprocessing.png
poetry.lock		poetry.lock
preprocessing.ipynb		preprocessing.ipynb
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Project

Diagrams

Preprocessing

Chatbot

Models

Install

Running

Docs

Tests

About

Uh oh!

Releases

Packages

Languages

gustavodemari/cvm-rag

Folders and files

Latest commit

History

Repository files navigation

Project

Diagrams

Preprocessing

Chatbot

Models

Install

Running

Docs

Tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages