This project demonstrates how to run and interact with Large Language Models (LLMs) locally using Ollama .
Instead of relying on cloud-based APIs, Ollama allows you to host models like LLaMA 3.2 Gemma 3 DeepSeek-R1 Llama 3.2 Vision Phi 4 Mistral Moondream 2 directly on your own machine. This gives you:
🔒 Privacy – your data stays on your device.
⚡ Speed – no external API latency.
🖥️ Offline capability – run LLMs without an internet connection.
🛠️ Flexibility – experiment with prompts, fine-tuning, and custom integrations.
In this repo, you’ll find Python code that sends chat requests to the local Ollama server via its REST API /api/chat. The implementation uses requests with streaming enabled, so responses are processed token by token in real time.
This serves as a foundation for:
=> Building chatbots and AI assistants.
=> Running experiments with prompts locally.
=> Developing apps powered by LLMs without external dependencies.
=> One should have at least 8 GB of RAM available to run the 7B models
=> At least 16 GB of RAM to run the 13B models
=> At least 32 GB of RAM to run the 33B models
Ollama GitHub Repository — the official source for downloading Ollama software and models.
