Private-GoPT is a RAG (Retrieval-Augmented Generation) server inspired by private-gpt. It allows you to chat with your documents locally, ensuring privacy and data security.
- Local and Private: Your data stays on your machine.
- LLM Support:
- OpenAI-compatible models
- Google Gemini
- Vector Store:
- Qdrant
- Chat Completion: Context-aware chat completion for interacting with your documents.
Adding support for different vector stores and LLM back-ends on its way.
This project uses Docker Compose to manage the necessary services.
To run the application with the default, non-gated embedding model, use the following command:
docker compose -f docker-compose.yml upIf you need to use a gated embedding model from Hugging Face, you will need to provide your Hugging Face token and specify the model.
- Set the
EM_MODELenvironment variable to the name of the gated model you want to use. - Set the
HF_TOKENenvironment variable to your Hugging Face Hub token.
Then, run the application with the override docker-compose file:
docker compose upIf you have an existing vector database created with private-gpt, you can migrate it to be compatible with private-gopt using the migration script.
The script will update the payload of the points in your Qdrant collection to match the schema expected by this application.
To run the migration, use the following command from the root of the project:
go run ./cmd/migrate --collection <your_collection_name>You can also specify the Qdrant host, port, and batch size using flags:
--host: Qdrant DB host address (default:localhost)--port: Qdrant DB gRPC port number (default:6334)--batchSize: Number of points processed per batch (default:1000)
go run ./cmd/migrate --host my-qdrant.local --port 6334 --collection my_private_gpt_collectionThe application is configured through the settings.toml file and environment variables.
port(integer): The port on which the HTTP server will listen.
mode(string): The LLM mode to use. Can be"openai"or"gemini".max_new_tokens(integer): The maximum number of new tokens the LLM can generate.context_window(integer): The context window size for the LLM.temperature(float): The temperature for the LLM sampling.model(string): The specific model name to use (e.g.,gpt-3.5-turbo,gemini-pro).
api_base(string): The base URL for the OpenAI-compatible API.request_timeout(integer): The request timeout in seconds.
This section is for Gemini-specific settings, but currently holds no values in settings.toml.
SECRET: A secret key for the application.OPENAI_API_KEY: Your OpenAI API key (if using OpenAI).GEMINI_API_KEY: Your Gemini API key (if using Gemini).EM_MODEL: The name of the embedding model to use (optional, for gated models).HF_TOKEN: Your Hugging Face Hub token (optional, for gated models).