Goal: A ground-up implementation of the modern AI Engineering stack, moving from basic prompting to autonomous MCP agents. Focus: Reliability, System Design, and building "Systems of Action" rather than just Chatbots.
Transparency Note: This repository was built with the assistance of Google's Gemini, acting as a pair programmer and technical coach. It is a repository that helped me consolidate many of the ideas and things learnt during the process.
I have been learning and building this repository as a comprehensive reference point for AI Engineering. I have tried to make it as generic and accessible as possible so that anyone who wants to transition to AI Engineering can use this documentation and code as a solid foundation.
Most people stop at "Prompt Engineering"—typing text into ChatGPT. AI Engineering is different. It is the discipline of treating Large Language Models (LLMs) not as magic boxes, but as software components.
This repository is a self-study guide designed to answer:
- How do we stop LLMs from hallucinating? (RAG)
- How do we connect them to our own database? (Tools)
- How do we build systems that can fix their own mistakes? (Agents)
- How do we deploy this to production reliably? (Evals & Ops)
Follow these steps to configure your environment for all phases.
1. Clone and Configure Environment
# Create a virtual environment
python3 -m venv .venv
# Activate it (Mac/Linux)
source .venv/bin/activate
# Activate it (Windows)
.venv\Scripts\activate
2. Install Dependencies
pip install -r requirements.txt
3. Set API Keys
Create a file named .env in the root folder and add your keys:
OPENAI_API_KEY=sk-proj-...
TAVILY_API_KEY=tvly-...
GCP_PROJECT_ID=...
GOOGLE_APPLICATION_CREDENTIALS=/abs/path/to/key.json
This curriculum follows a strict "Crawl, Walk, Run" progression. We do not jump straight to complex Agents because they are impossible to debug without strong foundations.
- Phase 1 (The Brain): Treats the LLM as a passive knowledge engine. Focuses on RAG and Prompting.
- Phase 2 (The Hands): Gives the LLM the ability to execute code. Focuses on Tool Use and Determinism.
- Phase 3 (The Application): Combines Brain and Hands into Agentic RAG with citations and self-correction.
- Phase 4 (The Logic): Introduces "Cortex" logic using Graphs (loops, retries, persistence).
- Phase 5 (The Protocol): Standardizes connections using MCP and Pydantic AI.
- Phase 6 (The Intelligence): Implements Vendor Agnosticism (Multi-Model Routing).
- Phase 7 (Production): Focuses on Scale, Evals, Caching, and Docker Deployment.
Goal: Control the LLM and give it access to knowledge.
| Part | Topic | Engineering Pattern |
|---|---|---|
| 01 | Prompting | In-Context Learning: Constraining model tone/style without fine-tuning (Few-Shot). |
| 02 | RAG Basics | Context Injection: Grounding the model in static data to prevent hallucinations. |
| 03 | Embeddings | Vector Math: Matching queries by semantic meaning, not just keywords. |
| 04 | Storage | Persistence: Using ChromaDB (HNSW index) for scalable, long-term memory. |
| 05 | Capstone 1 | The Resume Bot: An end-to-end RAG application with citation logic. |
Goal: Transition from "Passive Chat" to "Active Work" using Tools and Pydantic.
| Part | Topic | Engineering Pattern |
|---|---|---|
| 01 | Tools | Deterministic Execution: Letting LLMs trigger reliable Python functions (Math/API). |
| 02 | Chains | Pipelines: Breaking complex tasks into atomic, linear steps (A -> B -> C). |
| 03 | Schemas | Pydantic: Enforcing strict input/output validation. (No broken JSON). |
| 04 | Memory | Context Buffers: Managing conversation history in stateful applications. |
| 05 | Routing | Intent Classification: Using logic to route user queries to the correct tool. |
Goal: Building systems that can research, reason, and cite evidence.
| Part | Topic | Engineering Pattern |
|---|---|---|
| 01 | RAG Tool | Dynamic Retrieval: Wrapping Vector Search into a callable Tool so the Agent decides when to search. |
| 02 | Hybrid Agents | Multi-Tool Routing: An agent that can dynamically switch between Math (Calculator) and Search (RAG). |
| 03 | Citations | Grounding: Injecting source metadata into the context to force evidence-based answers. |
| 04 | Auto-Evals | LLM-as-a-Judge: Building automated unit tests to grade the agent's accuracy and citations. |
Goal: Building complex, self-correcting workflows that can recover from failure.
| Part | Topic | Engineering Pattern |
|---|---|---|
| 01 | LangGraph | Cyclic Graphs: Enabling loops and retries (e.g., "Try again if error"). |
| 02 | Persistence | Checkpointing: Saving the agent's state to a SQLite database so it can resume later. |
| 03 | Human-in-Loop | Approval Flows: Pausing execution for human review before sensitive actions. |
| 04 | Capstone 2 | The Shift Orchestrator: A semi-autonomous agent that extracts strict constraints and routes tasks to deterministic solvers (Supervisor Pattern). |
Goal: Standardizing connections and enforcing reliability with strict contracts.
| Part | Topic | Engineering Pattern |
|---|---|---|
| 01 | MCP Architecture | Protocol Design: Understanding the Client-Host-Server relationship (The "USB-C" for AI). |
| 02 | FastMCP Servers | Server Implementation: Building robust Python servers to expose local data resources standardly. |
| 03 | Strict Contracts | Pydantic AI: Implementing Type-Safe Agents that enforce strict I/O validation at the framework level. |
| 04 | Observability | Logfire: Implementing "X-Ray" vision to visualize agent reasoning, validation errors, and latency. |
| 05 | Capstone 3 | Health Ops System: A modular ecosystem where a Pydantic AI Client (Agent) connects to a secure MCP Server (Data) to manage hospital operations. |
Goal: Achieving vendor agnosticism by implementing Model Routing.
| Part | Topic | Engineering Pattern |
|---|---|---|
| abstraction | The Model Factory | Dependency Injection: A factory pattern to swap providers (OpenAI, Google, Anthropic) instantly. |
| integrations | Cloud Connectivity | Vendor SDKs: Implements connections for Google Vertex AI (Gemini 2.5) and AWS Bedrock (Claude 3.5). |
| routing | The Semantic Router | Supervisor Pattern: A Router Agent that classifies intent and dispatches tasks to the most cost-effective model. |
Goal: Moving from "It works on my machine" to deployed software.
| Part | Topic | Engineering Pattern |
|---|---|---|
| 01 | Advanced Evals | RAGAS Integration: Implements algorithmic scoring (Faithfulness, Relevancy) to mathematically prove agent reliability. |
| 02 | Semantic Caching | Vector Caching: Reduces API costs by 50% by caching "meaning" (Question A = Question B) rather than exact text. |
| 03 | Deployment | Docker: Production-ready containerization. (See Dockerfile in root). |
- Languages: Python 3.12 (Strict Type Hinting)
- Models: GPT-4o, Gemini 2.5 Flash, Claude 3.5 Sonnet
- Infrastructure: ChromaDB, SQLite, Docker, AWS Bedrock, Google Vertex AI
- Frameworks:
- LangChain / LangGraph: For orchestration and RAG chains.
- Pydantic AI: For type-safe agent definitions.
- Logfire: For observability and tracing.
- FastMCP: For building standardized Model Context Protocol servers.
- RAGAS: For automated evaluation and testing.