GenAI Learning Metromap

Welcome to the GenAI Learning Metromap! Navigating the field of GenAI can often feel daunting due to the interconnected concepts that require prior understanding. This guide aims to streamline the learning journey by mapping out these dependencies, helping to minimize cognitive overload. While there are numerous ways to structure such a learning path, this approach has worked for me. If you have ideas for improvement or alternative perspectives, I welcome your feedbacks.

This learning map covers three paths:

📐 Foundational - the knowledge needed for everyone including Mathematics, Programming, and Neural Networks.
🧑‍🔬 The GenAI Scientists Path - focuses on understanding, building and customising LLMs.
👷 The GenAI Engineers Path - focuses on creating LLM-based applications, deploying and operating them.

Foundational

1. Math Basics

The core mathematical foundations that power modern AI systems and the concepts required for understanding neural network architectures, optimization algorithms, and probabilistic modeling in machine learning.

Linear Algebra: Crucial for understanding deep learning algorithms. Key concepts include vectors, matrices, determinants, eigenvalues and eigenvectors, vector spaces, and linear transformations
Calculus: Machine learning algorithms involve the optimization of continuous functions, which requires an understanding of derivatives, integrals, limits, and series. Multivariate calculus and the concept of gradients are also important.
Probability and Statistics: For understanding how models learn from data and make predictions. Key concepts include probability theory, random variables, probability distributions, expectations, variance, covariance, correlation, hypothesis testing, confidence intervals, maximum likelihood estimation, and Bayesian inference.

📚 References:

2. Programming

Develop hands-on expertise with Python and its data science ecosystem, including skills in data manipulation, visualization, and implementation of machine learning algorithms using industry-standard libraries and frameworks.

Python Basics: A good understanding of the basic syntax, data types, error handling, and object-oriented programming.
Data Science Libraries: Includes NumPy for numerical operations, Pandas for data manipulation and analysis, Matplotlib and Seaborn for data visualization.
Data Pre-Processing: Feature scaling and normalization, handling missing data, outlier detection, categorical data encoding, and splitting data into training, validation, and test sets.
Machine Learning Libraries: Scikit for traditional ML algos and Pytorch for Deeplearning. Understanding how to implement algorithms like linear regression, logistic regression, decision trees, random forests, k-nearest neighbours (K-NN), and K-means clustering is important. Dimensionality reduction techniques like PCA and t-SNE are also helpful for visualizing high-dimensional data

📚 References:

3. Neural Nets & LLMs

Dive into the architecture and mechanics of neural networks and Large Language Models. This section bridges the gap between traditional neural networks and modern transformer-based architectures, providing insights into their training and optimization.

Nueral Net Fundamentals: Components of a neural network such as layers, weights, biases, and activation functions (sigmoid, tanh, ReLU, etc.)
Training & Optimization: Backpropagation and different types of loss functions, like Mean Squared Error (MSE) and Cross-Entropy. Understanding various optimization algorithms like Gradient Descent, Stochastic Gradient Descent, RMSprop, and Adam. Understanding the concept of overfitting (where a model performs well on training data but poorly on unseen data) and learn various regularization techniques (dropout, L1/L2 regularization, early stopping, data augmentation) to prevent it.
Implementing MLPs: Building a Multi Layer Perceptron, also known as a fully connected network, using PyTorch.
LLM Overview & LLM-OS: The core technical component behind systems like ChatGPT, Claude, and Bard. What they are, where they are headed, comparisons and analogies to present-day operating systems.

📚 References:

GenAI Scientists Path

4. LLM/SLM Foundations

Explore the core components that make Large Language Models work, from attention mechanisms to transformer architectures, the internal workings of modern language models and their sequence processing capabilities.

Multi-head Attention: The attention mechanism allows a model to focus on relevant parts of the input sequence when predicting outputs. In Transformers, the Scaled Dot-Product Attention is the core mechanism, where each token attends to every other token in the sequence, weighted by learned relevance scores.
Transformer Architecture: The Transformer is a neural network architecture introduced in the Attention Is All You Need paper. It relies entirely on the attention mechanism to draw global dependencies between input and output. It eliminates recurrence and convolutions, allowing for parallelization and scalability in deep learning.
Output Sequence Generation: In sequence-to-sequence tasks (e.g., language translation), the Transformer generates output tokens step-by-step using an autoregressive approach (predicting the next token based on previously generated tokens) or parallel decoding for some applications.
Tokenization: The process of breaking down input text into smaller units (tokens), such as words, subwords, or characters. Models like GPT and BERT use subword tokenization (e.g., Byte Pair Encoding or WordPiece) to handle unknown words and reduce vocabulary size.

📚 References:

There is only one place to go - Andrei Karpathy's God level contribution: Neural Networks: Zero to Hero Playlist

5. Pre-Training

The techniques for training large language models from scratch, including data management, optimization strategies, and compute scaling. This section covers the critical aspects of building and training foundation models efficiently.

Data Management: Curating large datasets for quality and representation of input data. Understanding how it affects LLM's generalization.
Optimization Strategies: Large-scale training optimizers(ex. AdamW, LAMB), Regularization methods (ex. LayerNorms, Weight Decay) and Stability techniques(ex. Gradient Clipping, loss scaling)
Compute Scaling: Scaling Law, Parallelism Techniques (model, data, pipelien parallelism) and efficiency techniques including Mixed Prevision, Gradient Checkpointing etc.

📚 References:

OPENAI's GPT-2 Dataset Documentation
LLM DataHub - Curated list of datasets
Training a causal language model from scratch by Hugging Face Pre-train a GPT-2 model from scratch using the transformers library.
TinyLlama by Zhang et al. How a Llama model is trained from scratch.
Chinchilla's Scaling laws and explain what they mean to LLMs in general.

6. Fine-Tuning Data Preparation

The art of preparing high-quality datasets for model fine-tuning, including data filtering, synthetic data generation, and prompt engineering. This section focuses on the crucial data preparation steps that determine model performance.

Datasets/Synthetic: High-quality datasets are essential for training. Synthetic datasets, created programmatically, are sometimes used to augment real datasets, especially when domain-specific data is scarce.
Filtering Data: Filtering ensures the dataset quality by removing noise, duplicates, and irrelevant entries. Techniques include heuristics, model-based filtering, or crowd-sourcing evaluations to ensure that only meaningful data is used for fine-tuning.
Prompt Templates: Prompt templates are pre-designed input formats that help elicit desired responses from language models. These templates structure queries effectively and are critical in few-shot learning or instruction-following tasks.

📚 References:

Hugging Face Datasets LibraryCovers practical tools and techniques to gather and prepare datasets, a critical first step in fine-tuning.
Data-Centric AI by Andrew Ng Offers in-depth guidance on applying heuristics, model-based filtering, and other approaches to ensure dataset quality.
OpenAI Cookbook on Prompt Design Demonstrates how structured prompts improve fine-tuning outcomes, especially for instruction-based or few-shot learning tasks

7. Supervised Fine-Tuning

Advanced techniques for adapting pre-trained models to specific tasks, from full fine-tuning to efficient methods like LoRA and QLoRA. This section covers practical approaches to model adaptation while managing computational resources.

Full Fine-Tuning: Updating all model parameters on a labeled dataset to specialize the model for a specific task. This approach is computationally intensive but yields the best performance for high-resource tasks.
LoRA, QLoRA: LoRA (Low-Rank Adaptation) - Fine-tuning a smaller subset of parameters (low-rank matrices) while freezing most of the model, making it memory-efficient and faster. QLoRA (Quantized LoRA) - An enhancement of LoRA that uses quantization to reduce memory requirements further, enabling fine-tuning of large models on commodity hardware.
Fine-Tuning Tools: Frameworks and libraries that simplify the fine-tuning process by providing pre-built utilities for dataset loading, training loops, and evaluation. Hugging Face's Parameter-Efficient Fine-Tuning, DeepSpeed, and Accelerate
Deep Learning Optimzation: Techniques to enhance the efficiency and stability of the training process, such as adaptive optimizers (e.g., AdamW), learning rate schedules, gradient clipping, and distributed training strategies.

📚 References:

The Ultimate Guide to Fine-Tuning An exhaustive review of technologies, research & best practices.
The Hugging Face PEFT GitHub repository provides a straightforward way to get started with LoRA using Python code
Customize models in Amazon Bedrock with your own data using fine-tuning and continued pre-training
DeepSpeed and ZeRO Optimization Techniques - Practical tips for scaling and optimizing training processes.

8. Alignment

Understand the methods for aligning language models with human values and specific objectives. This section covers techniques like RLHF, Constitutional AI, and preference optimization to ensure models generate helpful and appropriate responses.

Model alignment refers to the process of ensuring that the behavior of a large language model aligns with human values, goals, or specific preferences. Reinforcement Learning with Human Feedback (RLHF) is the broader framework that is used in tailoring the model's responses to be helpful, truthful, and safe, while avoiding undesirable behaviors.

Constituitional AI: A technique developed by Anthropic to align models with ethical principles without relying entirely on human feedback.Models are trained to critique and refine their own outputs based on predefined "constitutional" principles.
Proximal Policy Optimization: Optimizes the model’s policy to maximize the rewards, steering the model towards generating responses that align with human preferences.
Direct Preference Optimization: directly optimizes the model to prefer certain outputs over others, based on pairwise human-provided comparisons.

📚 References:

Fine-tune Mistral-7b with DPO: Tutorial to fine-tune a Mistral-7b model with DPO and reproduce NeuralHermes-2.5.
An Introduction to Training LLMs using RLHF - Explains why RLHF is desirable to reduce bias and increase performance in LLMs.
Preference Tuning LLMs by Hugging Face: Comparison of the DPO, IPO, and KTO algorithms to perform preference alignment.
Illustrating Reinforcement Learning from Human Feedback (RLHF)
DPO Github Repo by Hugging Face - Practical examples and implementation of DPO in fine-tuning LLMs. Clear explanations of concepts and Python code.
Collective Constitutional AI - Anthropic blog on aligning a Language Model with public input

9. Evaluation

The metrics and methodologies for assessing language model performance, from traditional benchmarks to human evaluation frameworks.

Metrics: Metrics like perplexity and BLEU score are not as popular as they were because they're flawed in most contexts. It is still important to understand them and when they can be applied.
General benchmarks: Based on the Language Model Evaluation Harness, the Open LLM Leaderboard is the main benchmark for general-purpose LLMs (like ChatGPT). There are other popular benchmarks like BigBench, MT-Bench, etc.
Task-specific benchmarks: Tasks like summarization, translation, and question answering have dedicated benchmarks, metrics, and even subdomains (medical, financial, etc.), such as PubMedQA for biomedical question answering.
Human evaluation: The most reliable evaluation is the acceptance rate by users or comparisons made by humans. Logging user feedback in addition to the chat traces (e.g., using LangSmith) helps to identify potential areas for improvement.

📚 References:

Perplexity of fixed-length models by Hugging Face: Overview of perplexity with code to implement it with the transformers library.
BLEU at your own risk by Rachael Tatman: Overview of the BLEU score and its many issues with examples.
A Survey on Evaluation of LLMs by Chang et al.: Comprehensive paper about what to evaluate, where to evaluate, and how to evaluate.
Chatbot Arena Leaderboard by lmsys: Elo rating of general-purpose LLMs, based on comparisons made by humans.

10. Optimization

Optimization techniques that primarily optimizes memory usage, computational efficiency, and energy consumption. It is a critical method for deploying large models in resource-constrained environments and for scaling inference workloads efficiently.

Naive Quantization: A basic form of model compression that uniformly reduces the numerical precision of all model weights (e.g., from 32-bit floating point to 8-bit integers) without considering the impact on model performance. While simple to implement, this approach can lead to significant accuracy degradation as it doesn't account for the varying sensitivity of different model components.
Quantization for CPUs: A specialized quantization technique optimized for CPU inference that typically uses 8-bit integer (INT8) quantization. This approach includes careful calibration of scaling factors and zero points for each tensor, often utilizing techniques like per-channel quantization and dynamic range adjustment to maintain accuracy while leveraging CPU-specific optimizations.
Quantization for GPUs: A GPU-specific quantization approach that focuses on maintaining computational efficiency while preserving model accuracy. It often employs techniques like FP16 (half-precision) or INT8 quantization with GPU-optimized kernels, and may include features like tensor core utilization and mixed-precision training to balance performance and accuracy.
Attention Aware Quantization: A sophisticated quantization strategy that specifically targets attention mechanisms in transformer models. This method applies different quantization schemes to attention-related computations versus other model components, recognizing that attention layers are particularly sensitive to precision reduction. It often preserves higher precision for key attention operations while allowing more aggressive quantization elsewhere.
Distillation: A model compression technique where a smaller model (student) is trained to mimic the behavior of a larger model (teacher). The student learns not just from the final outputs but also from the teacher's intermediate representations and attention patterns. This approach can significantly reduce model size while maintaining much of the original model's performance, making it particularly valuable for deployment in resource-constrained environments.

📚 References:

Introduction to Model Quantization by PyTorch: Clear explanation of quantization basics with code examples.
4-bit LLM Quantization with GPTQ: Tutorial on how to quantize an LLM using the GPTQ algorithm with AutoGPTQ.
Understanding Activation-Aware Weight Quantization by FriendliAI: Overview of the AWQ technique and its benefits.
Amazon Bedrock Model Distillation^VIDEO

11. Emerging Trends

Hybrid AI Systems: Hybrid AI systems combine different types of artificial intelligence, such as deep learning and symbolic reasoning. This integration aims to enhance problem-solving capabilities by leveraging the strengths of various AI approaches. For instance, combining large language models (LLMs) with knowledge graphs can lead to more effective decision-making and reasoning processes, particularly in complex fields like healthcare and finance
Large Concept Models: LCMs focus on processing concepts rather than tokens. This approach allows for more sophisticated reasoning and understanding by treating entire sentences as semantic units, enabling models to operate in a language-independent and multimodal manner
GenXAI: Explainable GenAI (GenXAI) is a field that focusses on exploring how models represent knowledge and to provide human understandable explanations for model's outputs and decisions. The key technics include Mechanistic Interpretability, Feature Attribution, Probing-Based and Sample-Based.
Instruct Graph: Focuses on enhancing LLMs with graph-centric approaches, aiming to improve their performance on graph reasoning and generation tasks. This involves using structured formats to bridge the gap between textual and graph data

📚 References:

A Beginner's Guide to Hybrid AI Systems
Large Concept Models (LCMs) by Meta: The Era of AI After LLMs
Explainable Generative AI (GenXAI): A Survey, Conceptualization, and Research Agenda
Interpretable Machine Learning - Christoph Molnar
InstructGraph - Boosting Large Language Models via Graph-centric

GenAI Engineers Path

12. Consuming LLMs

LLM APIs: Understanding and integrating commercial LLM services like OpenAI, Anthropic, and cloud providers' APIs. Learn about authentication, rate limiting, error handling, and cost optimization strategies for production deployments.
Open Source LLMs: Exploring self-hosted models like Llama, Mistral, and Falcon. Understanding deployment options, hardware requirements, and trade-offs between model sizes and capabilities for local or private cloud deployments.
Prompt Engineering: Mastering techniques for crafting effective prompts including few-shot learning, chain-of-thought prompting, and system prompts. Learn best practices for consistent and reliable model outputs.
Structuring Outputs: Techniques for controlling model responses through output parsers, JSON schemas, and structured prompts. Understanding methods to extract specific data formats and maintain consistent response structures.

📚 References:

Amazon Bedrock References on Generation - Generate text, code, summaries, Q/A, entity extraction using LLMs on Amazon Bedrock by using the Bedrock API
OpenAI API Documentation - Comprehensive guide to using OpenAI's API services
LangChain Documentation - Framework for developing applications with LLMs
Prompt Engineering Guide - Comprehensive resource for prompt engineering techniques
Text Generation Inference - Hugging Face's production-ready inference solution
LlamaIndex Documentation - Framework for building LLM applications over custom data

13. Building Vector Store

Chunking Methods: Techniques for breaking down documents into meaningful segments for vector storage. Includes text splitting strategies like fixed-length chunks, sentence-based splits, and semantic chunking to optimize retrieval effectiveness.
Indexing Schemes: Algorithms and data structures for efficient vector similarity search. Understanding approaches like HNSW (Hierarchical Navigable Small World), IVF (Inverted File Index), and LSH (Locality-Sensitive Hashing) for scalable nearest neighbor search.
Embedding Models: Understanding and selecting appropriate embedding models for converting text to vectors. Includes domain-specific models, multilingual capabilities, and techniques for optimizing embedding quality and computational efficiency.
Vector Databases: Exploring vector database solutions like Pinecone, Weaviate, and Milvus. Understanding their architectures, querying capabilities, scaling considerations, and integration patterns with LLM applications.

📚 References:

LlamaIndex Guide to Text Chunking - Comprehensive overview of text chunking strategies
Understanding HNSW Algorithm - Deep dive into the HNSW indexing algorithm
Sentence Transformers Documentation - Guide to state-of-the-art embedding models
Vector Database Comparison - Detailed comparison of popular vector databases
Weaviate Learning Center - Practical guides for implementing vector search

14. Retrieval Augmented Generation (RAG)

Orchestrators: Frameworks like LangChain and LlamaIndex that coordinate the RAG pipeline, managing document processing, retrieval, and LLM interactions. These tools provide abstractions for building complex RAG applications with features like caching, error handling, and monitoring.
Retrievers: Components that fetch relevant information from vector stores using similarity search, hybrid search, or re-ranking techniques. Advanced retrievers may use techniques like query expansion, contextual compression, or multi-step retrieval to improve result quality.
Memory: Systems for maintaining conversation history and managing context windows in RAG applications. Includes short-term memory for ongoing conversations and long-term memory for persistent knowledge across sessions.
RAG Evaluation: Methods for assessing RAG system performance including retrieval accuracy, answer relevance, and faithfulness to source documents. Metrics like RAGAS help evaluate context relevance, answer faithfulness, and overall response quality.

📚 References:

Amazon Bedrock References for Building RAG Workflows
RAG vs Fine-tuning - Comprehensive comparison of RAG and fine-tuning approaches
RAG vs Fine-tuning - Comprehensive comparison of RAG and fine-tuning approaches
Advanced RAG Techniques - Deep dive into sophisticated retrieval methods
RAGAS: RAG Assessment Framework - Framework for evaluating RAG systems
LangChain RAG Tutorial - Practical guide to building RAG applications
Parent Document Retrieval - Advanced retrieval techniques for hierarchical documents

15. Advanced GenAI Apps

Query Construction: Techniques for dynamically building and optimizing prompts based on user input and context. Includes methods for query decomposition, reformulation, and contextual enhancement to improve response quality and relevance.
Agentic AI Apps: Applications that use LLMs as autonomous agents capable of planning and executing multi-step tasks. Involves tools like AutoGPT and Amazon Bedrock Agents for goal-oriented problem solving, task decomposition, and self-improvement through recursive refinement.
Guardrails: Implementation of safety measures and control mechanisms in GenAI applications. Includes content filtering, output validation, ethical constraints, and monitoring systems to ensure responsible and controlled AI behavior.

📚 References:

LangChain Expression Language - Framework for dynamic prompt construction
AutoGPT: An Autonomous GPT-4 Experiment - Popular framework for building autonomous AI agents
NeMo Guardrails Documentation - Comprehensive toolkit for implementing AI safety measures
ReAct: Synergizing Reasoning and Acting in Language Models - Framework for building reasoning agents
Amazon Bedrock Multi-agent Collaboration

16. Inference Optimization

Flash Attention: A memory-efficient attention algorithm that reduces memory usage and increases speed by tiling attention computation. It optimizes memory access patterns and achieves better hardware utilization, making it particularly effective for training and inference of large language models.
Key-Value Cache: A technique that stores previously computed key and value tensors from the attention mechanism to avoid redundant computations during autoregressive generation. This optimization significantly speeds up inference by reusing intermediate results across generation steps.
Speculative Decoding: An inference optimization method that uses a smaller, faster model to predict multiple tokens in parallel, which are then verified by the main model. This approach can significantly reduce the latency of text generation by parallelizing what is typically a sequential process.

📚 References:

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness - Original paper introducing Flash Attention
vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention - Implementation of efficient attention mechanisms
Text Generation Inference (TGI) - Hugging Face's optimized inference server
Fast Transformer Inference with Better Attention - PyTorch's guide to optimized transformer inference

17. Deploying LLMs

Local Deployment: Techniques for deploying LLMs on local machines, including CPU and GPU optimizations, memory management, and containerization. Understanding hardware requirements, model quantization, and local inference servers for optimal performance.
Demo Deployment: Methods for quickly deploying LLM applications for demonstration purposes using platforms like Hugging Face Spaces, Streamlit, or Gradio. These platforms offer easy-to-use interfaces for showcasing LLM capabilities with minimal setup.
Server Deployment: Strategies for deploying LLMs in production environments using cloud services or on-premises infrastructure. Includes load balancing, auto-scaling, monitoring, and high-availability configurations for reliable service delivery.
Edge Deployment: Techniques for deploying optimized LLMs on edge devices with limited resources. Focuses on model compression, efficient inference, and device-specific optimizations for mobile phones, IoT devices, and embedded systems.

📚 References:

vLLM Production Deployment Guide - Comprehensive guide for deploying LLMs at scale
Gradio Documentation - Framework for creating demo interfaces for ML models
TensorRT-LLM - NVIDIA's toolkit for optimizing LLM inference
AWS SageMaker Deployment Guide - Best practices for deploying ML models in production
TinyLLM: Edge Deployment of LLMs - Guide to deploying compact LLMs on edge devices

18. Securing LLMs

Prompt Injection: Malicious inputs designed to manipulate LLM behavior by bypassing safety measures. Attackers craft prompts that trick the model into performing unintended actions or revealing sensitive information.
Adversarial Attacks: Carefully crafted inputs that cause the model to produce incorrect or harmful outputs. These can exploit model vulnerabilities to generate misleading or inappropriate responses.
Jailbreaking: Methods to bypass model safety measures and restrictions. Attackers use creative prompting techniques to make models generate content that violates their intended constraints.
RLHF Attacks: Exploits targeting the reinforcement learning from human feedback mechanism. These attacks aim to manipulate the model's learned preferences and alignment.

📚 References:

LLM Security: Understanding and Mitigating Security Risks - Comprehensive overview of LLM security challenges and defenses
Prompt Injection Attacks and Defenses - Microsoft's guide to understanding and preventing prompt injection
Adversarial Machine Learning in LLMs - Microsoft's collection of resources on adversarial attacks
Red Teaming Language Models - Hugging Face's guide to testing LLM security
Constitutional AI: A Guide to Safe LLM Development - Anthropic's approach to building safer AI systems

🧠 Happy learning! 🥳

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
resources		resources
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GenAI Learning Metromap

Foundational

1. Math Basics

2. Programming

3. Neural Nets & LLMs

GenAI Scientists Path

4. LLM/SLM Foundations

5. Pre-Training

6. Fine-Tuning Data Preparation

7. Supervised Fine-Tuning

8. Alignment

9. Evaluation

10. Optimization

11. Emerging Trends

GenAI Engineers Path

12. Consuming LLMs

13. Building Vector Store

14. Retrieval Augmented Generation (RAG)

15. Advanced GenAI Apps

16. Inference Optimization

17. Deploying LLMs

18. Securing LLMs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

GenAI Learning Metromap

Foundational

1. Math Basics

2. Programming

3. Neural Nets & LLMs

GenAI Scientists Path

4. LLM/SLM Foundations

5. Pre-Training

6. Fine-Tuning Data Preparation

7. Supervised Fine-Tuning

8. Alignment

9. Evaluation

10. Optimization

11. Emerging Trends

GenAI Engineers Path

12. Consuming LLMs

13. Building Vector Store

14. Retrieval Augmented Generation (RAG)

15. Advanced GenAI Apps

16. Inference Optimization

17. Deploying LLMs

18. Securing LLMs

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages