Skip to content

spraja08/GenAIMetroMap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

61 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

GenAI Learning Metromap

GenAI Learning Metromap

Welcome to the GenAI Learning Metromap! Navigating the field of GenAI can often feel daunting due to the interconnected concepts that require prior understanding. This guide aims to streamline the learning journey by mapping out these dependencies, helping to minimize cognitive overload. While there are numerous ways to structure such a learning path, this approach has worked for me. If you have ideas for improvement or alternative perspectives, I welcome your feedbacks.

This learning map covers three paths:

  1. πŸ“ Foundational - the knowledge needed for everyone including Mathematics, Programming, and Neural Networks.
  2. πŸ§‘β€πŸ”¬ The GenAI Scientists Path - focuses on understanding, building and customising LLMs.
  3. πŸ‘· The GenAI Engineers Path - focuses on creating LLM-based applications, deploying and operating them.

Foundational

1. Math Basics

The core mathematical foundations that power modern AI systems and the concepts required for understanding neural network architectures, optimization algorithms, and probabilistic modeling in machine learning.

  • Linear Algebra: Crucial for understanding deep learning algorithms. Key concepts include vectors, matrices, determinants, eigenvalues and eigenvectors, vector spaces, and linear transformations
  • Calculus: Machine learning algorithms involve the optimization of continuous functions, which requires an understanding of derivatives, integrals, limits, and series. Multivariate calculus and the concept of gradients are also important.
  • Probability and Statistics: For understanding how models learn from data and make predictions. Key concepts include probability theory, random variables, probability distributions, expectations, variance, covariance, correlation, hypothesis testing, confidence intervals, maximum likelihood estimation, and Bayesian inference.

πŸ“š References:


2. Programming

Develop hands-on expertise with Python and its data science ecosystem, including skills in data manipulation, visualization, and implementation of machine learning algorithms using industry-standard libraries and frameworks.

  • Python Basics: A good understanding of the basic syntax, data types, error handling, and object-oriented programming.
  • Data Science Libraries: Includes NumPy for numerical operations, Pandas for data manipulation and analysis, Matplotlib and Seaborn for data visualization.
  • Data Pre-Processing: Feature scaling and normalization, handling missing data, outlier detection, categorical data encoding, and splitting data into training, validation, and test sets.
  • Machine Learning Libraries: Scikit for traditional ML algos and Pytorch for Deeplearning. Understanding how to implement algorithms like linear regression, logistic regression, decision trees, random forests, k-nearest neighbours (K-NN), and K-means clustering is important. Dimensionality reduction techniques like PCA and t-SNE are also helpful for visualizing high-dimensional data

πŸ“š References:


3. Neural Nets & LLMs

Dive into the architecture and mechanics of neural networks and Large Language Models. This section bridges the gap between traditional neural networks and modern transformer-based architectures, providing insights into their training and optimization.

  • Nueral Net Fundamentals: Components of a neural network such as layers, weights, biases, and activation functions (sigmoid, tanh, ReLU, etc.)
  • Training & Optimization: Backpropagation and different types of loss functions, like Mean Squared Error (MSE) and Cross-Entropy. Understanding various optimization algorithms like Gradient Descent, Stochastic Gradient Descent, RMSprop, and Adam. Understanding the concept of overfitting (where a model performs well on training data but poorly on unseen data) and learn various regularization techniques (dropout, L1/L2 regularization, early stopping, data augmentation) to prevent it.
  • Implementing MLPs: Building a Multi Layer Perceptron, also known as a fully connected network, using PyTorch.
  • LLM Overview & LLM-OS: The core technical component behind systems like ChatGPT, Claude, and Bard. What they are, where they are headed, comparisons and analogies to present-day operating systems.

πŸ“š References:


GenAI Scientists Path

4. LLM/SLM Foundations

Explore the core components that make Large Language Models work, from attention mechanisms to transformer architectures, the internal workings of modern language models and their sequence processing capabilities.

  • Multi-head Attention: The attention mechanism allows a model to focus on relevant parts of the input sequence when predicting outputs. In Transformers, the Scaled Dot-Product Attention is the core mechanism, where each token attends to every other token in the sequence, weighted by learned relevance scores.
  • Transformer Architecture: The Transformer is a neural network architecture introduced in the Attention Is All You Need paper. It relies entirely on the attention mechanism to draw global dependencies between input and output. It eliminates recurrence and convolutions, allowing for parallelization and scalability in deep learning.
  • Output Sequence Generation: In sequence-to-sequence tasks (e.g., language translation), the Transformer generates output tokens step-by-step using an autoregressive approach (predicting the next token based on previously generated tokens) or parallel decoding for some applications.
  • Tokenization: The process of breaking down input text into smaller units (tokens), such as words, subwords, or characters. Models like GPT and BERT use subword tokenization (e.g., Byte Pair Encoding or WordPiece) to handle unknown words and reduce vocabulary size.

πŸ“š References:


5. Pre-Training

The techniques for training large language models from scratch, including data management, optimization strategies, and compute scaling. This section covers the critical aspects of building and training foundation models efficiently.

  • Data Management: Curating large datasets for quality and representation of input data. Understanding how it affects LLM's generalization.
  • Optimization Strategies: Large-scale training optimizers(ex. AdamW, LAMB), Regularization methods (ex. LayerNorms, Weight Decay) and Stability techniques(ex. Gradient Clipping, loss scaling)
  • Compute Scaling: Scaling Law, Parallelism Techniques (model, data, pipelien parallelism) and efficiency techniques including Mixed Prevision, Gradient Checkpointing etc.

πŸ“š References:


6. Fine-Tuning Data Preparation

The art of preparing high-quality datasets for model fine-tuning, including data filtering, synthetic data generation, and prompt engineering. This section focuses on the crucial data preparation steps that determine model performance.

  • Datasets/Synthetic: High-quality datasets are essential for training. Synthetic datasets, created programmatically, are sometimes used to augment real datasets, especially when domain-specific data is scarce.
  • Filtering Data: Filtering ensures the dataset quality by removing noise, duplicates, and irrelevant entries. Techniques include heuristics, model-based filtering, or crowd-sourcing evaluations to ensure that only meaningful data is used for fine-tuning.
  • Prompt Templates: Prompt templates are pre-designed input formats that help elicit desired responses from language models. These templates structure queries effectively and are critical in few-shot learning or instruction-following tasks.

πŸ“š References:

  • Hugging Face Datasets LibraryCovers practical tools and techniques to gather and prepare datasets, a critical first step in fine-tuning.
  • Data-Centric AI by Andrew Ng Offers in-depth guidance on applying heuristics, model-based filtering, and other approaches to ensure dataset quality.
  • OpenAI Cookbook on Prompt Design Demonstrates how structured prompts improve fine-tuning outcomes, especially for instruction-based or few-shot learning tasks

7. Supervised Fine-Tuning

Advanced techniques for adapting pre-trained models to specific tasks, from full fine-tuning to efficient methods like LoRA and QLoRA. This section covers practical approaches to model adaptation while managing computational resources.

  • Full Fine-Tuning: Updating all model parameters on a labeled dataset to specialize the model for a specific task. This approach is computationally intensive but yields the best performance for high-resource tasks.
  • LoRA, QLoRA: LoRA (Low-Rank Adaptation) - Fine-tuning a smaller subset of parameters (low-rank matrices) while freezing most of the model, making it memory-efficient and faster. QLoRA (Quantized LoRA) - An enhancement of LoRA that uses quantization to reduce memory requirements further, enabling fine-tuning of large models on commodity hardware.
  • Fine-Tuning Tools: Frameworks and libraries that simplify the fine-tuning process by providing pre-built utilities for dataset loading, training loops, and evaluation. Hugging Face's Parameter-Efficient Fine-Tuning, DeepSpeed, and Accelerate
  • Deep Learning Optimzation: Techniques to enhance the efficiency and stability of the training process, such as adaptive optimizers (e.g., AdamW), learning rate schedules, gradient clipping, and distributed training strategies.

πŸ“š References:


8. Alignment

Understand the methods for aligning language models with human values and specific objectives. This section covers techniques like RLHF, Constitutional AI, and preference optimization to ensure models generate helpful and appropriate responses.

Model alignment refers to the process of ensuring that the behavior of a large language model aligns with human values, goals, or specific preferences. Reinforcement Learning with Human Feedback (RLHF) is the broader framework that is used in tailoring the model's responses to be helpful, truthful, and safe, while avoiding undesirable behaviors.

  • Constituitional AI: A technique developed by Anthropic to align models with ethical principles without relying entirely on human feedback.Models are trained to critique and refine their own outputs based on predefined "constitutional" principles.
  • Proximal Policy Optimization: Optimizes the model’s policy to maximize the rewards, steering the model towards generating responses that align with human preferences.
  • Direct Preference Optimization: directly optimizes the model to prefer certain outputs over others, based on pairwise human-provided comparisons.

πŸ“š References:


9. Evaluation

The metrics and methodologies for assessing language model performance, from traditional benchmarks to human evaluation frameworks.

  • Metrics: Metrics like perplexity and BLEU score are not as popular as they were because they're flawed in most contexts. It is still important to understand them and when they can be applied.
  • General benchmarks: Based on the Language Model Evaluation Harness, the Open LLM Leaderboard is the main benchmark for general-purpose LLMs (like ChatGPT). There are other popular benchmarks like BigBench, MT-Bench, etc.
  • Task-specific benchmarks: Tasks like summarization, translation, and question answering have dedicated benchmarks, metrics, and even subdomains (medical, financial, etc.), such as PubMedQA for biomedical question answering.
  • Human evaluation: The most reliable evaluation is the acceptance rate by users or comparisons made by humans. Logging user feedback in addition to the chat traces (e.g., using LangSmith) helps to identify potential areas for improvement.

πŸ“š References:


10. Optimization

Optimization techniques that primarily optimizes memory usage, computational efficiency, and energy consumption. It is a critical method for deploying large models in resource-constrained environments and for scaling inference workloads efficiently.

  • Naive Quantization: A basic form of model compression that uniformly reduces the numerical precision of all model weights (e.g., from 32-bit floating point to 8-bit integers) without considering the impact on model performance. While simple to implement, this approach can lead to significant accuracy degradation as it doesn't account for the varying sensitivity of different model components.

  • Quantization for CPUs: A specialized quantization technique optimized for CPU inference that typically uses 8-bit integer (INT8) quantization. This approach includes careful calibration of scaling factors and zero points for each tensor, often utilizing techniques like per-channel quantization and dynamic range adjustment to maintain accuracy while leveraging CPU-specific optimizations.

  • Quantization for GPUs: A GPU-specific quantization approach that focuses on maintaining computational efficiency while preserving model accuracy. It often employs techniques like FP16 (half-precision) or INT8 quantization with GPU-optimized kernels, and may include features like tensor core utilization and mixed-precision training to balance performance and accuracy.

  • Attention Aware Quantization: A sophisticated quantization strategy that specifically targets attention mechanisms in transformer models. This method applies different quantization schemes to attention-related computations versus other model components, recognizing that attention layers are particularly sensitive to precision reduction. It often preserves higher precision for key attention operations while allowing more aggressive quantization elsewhere.

  • Distillation: A model compression technique where a smaller model (student) is trained to mimic the behavior of a larger model (teacher). The student learns not just from the final outputs but also from the teacher's intermediate representations and attention patterns. This approach can significantly reduce model size while maintaining much of the original model's performance, making it particularly valuable for deployment in resource-constrained environments.

πŸ“š References:


11. Emerging Trends

  • Hybrid AI Systems: Hybrid AI systems combine different types of artificial intelligence, such as deep learning and symbolic reasoning. This integration aims to enhance problem-solving capabilities by leveraging the strengths of various AI approaches. For instance, combining large language models (LLMs) with knowledge graphs can lead to more effective decision-making and reasoning processes, particularly in complex fields like healthcare and finance

  • Large Concept Models: LCMs focus on processing concepts rather than tokens. This approach allows for more sophisticated reasoning and understanding by treating entire sentences as semantic units, enabling models to operate in a language-independent and multimodal manner

  • GenXAI: Explainable GenAI (GenXAI) is a field that focusses on exploring how models represent knowledge and to provide human understandable explanations for model's outputs and decisions. The key technics include Mechanistic Interpretability, Feature Attribution, Probing-Based and Sample-Based.

  • Instruct Graph: Focuses on enhancing LLMs with graph-centric approaches, aiming to improve their performance on graph reasoning and generation tasks. This involves using structured formats to bridge the gap between textual and graph data

πŸ“š References:


GenAI Engineers Path

12. Consuming LLMs

  • LLM APIs: Understanding and integrating commercial LLM services like OpenAI, Anthropic, and cloud providers' APIs. Learn about authentication, rate limiting, error handling, and cost optimization strategies for production deployments.

  • Open Source LLMs: Exploring self-hosted models like Llama, Mistral, and Falcon. Understanding deployment options, hardware requirements, and trade-offs between model sizes and capabilities for local or private cloud deployments.

  • Prompt Engineering: Mastering techniques for crafting effective prompts including few-shot learning, chain-of-thought prompting, and system prompts. Learn best practices for consistent and reliable model outputs.

  • Structuring Outputs: Techniques for controlling model responses through output parsers, JSON schemas, and structured prompts. Understanding methods to extract specific data formats and maintain consistent response structures.

πŸ“š References:


13. Building Vector Store

  • Chunking Methods: Techniques for breaking down documents into meaningful segments for vector storage. Includes text splitting strategies like fixed-length chunks, sentence-based splits, and semantic chunking to optimize retrieval effectiveness.

  • Indexing Schemes: Algorithms and data structures for efficient vector similarity search. Understanding approaches like HNSW (Hierarchical Navigable Small World), IVF (Inverted File Index), and LSH (Locality-Sensitive Hashing) for scalable nearest neighbor search.

  • Embedding Models: Understanding and selecting appropriate embedding models for converting text to vectors. Includes domain-specific models, multilingual capabilities, and techniques for optimizing embedding quality and computational efficiency.

  • Vector Databases: Exploring vector database solutions like Pinecone, Weaviate, and Milvus. Understanding their architectures, querying capabilities, scaling considerations, and integration patterns with LLM applications.

πŸ“š References:


14. Retrieval Augmented Generation (RAG)

  • Orchestrators: Frameworks like LangChain and LlamaIndex that coordinate the RAG pipeline, managing document processing, retrieval, and LLM interactions. These tools provide abstractions for building complex RAG applications with features like caching, error handling, and monitoring.

  • Retrievers: Components that fetch relevant information from vector stores using similarity search, hybrid search, or re-ranking techniques. Advanced retrievers may use techniques like query expansion, contextual compression, or multi-step retrieval to improve result quality.

  • Memory: Systems for maintaining conversation history and managing context windows in RAG applications. Includes short-term memory for ongoing conversations and long-term memory for persistent knowledge across sessions.

  • RAG Evaluation: Methods for assessing RAG system performance including retrieval accuracy, answer relevance, and faithfulness to source documents. Metrics like RAGAS help evaluate context relevance, answer faithfulness, and overall response quality.

πŸ“š References:


15. Advanced GenAI Apps

  • Query Construction: Techniques for dynamically building and optimizing prompts based on user input and context. Includes methods for query decomposition, reformulation, and contextual enhancement to improve response quality and relevance.

  • Agentic AI Apps: Applications that use LLMs as autonomous agents capable of planning and executing multi-step tasks. Involves tools like AutoGPT and Amazon Bedrock Agents for goal-oriented problem solving, task decomposition, and self-improvement through recursive refinement.

  • Guardrails: Implementation of safety measures and control mechanisms in GenAI applications. Includes content filtering, output validation, ethical constraints, and monitoring systems to ensure responsible and controlled AI behavior.

πŸ“š References:


16. Inference Optimization

  • Flash Attention: A memory-efficient attention algorithm that reduces memory usage and increases speed by tiling attention computation. It optimizes memory access patterns and achieves better hardware utilization, making it particularly effective for training and inference of large language models.

  • Key-Value Cache: A technique that stores previously computed key and value tensors from the attention mechanism to avoid redundant computations during autoregressive generation. This optimization significantly speeds up inference by reusing intermediate results across generation steps.

  • Speculative Decoding: An inference optimization method that uses a smaller, faster model to predict multiple tokens in parallel, which are then verified by the main model. This approach can significantly reduce the latency of text generation by parallelizing what is typically a sequential process.

πŸ“š References:


17. Deploying LLMs

  • Local Deployment: Techniques for deploying LLMs on local machines, including CPU and GPU optimizations, memory management, and containerization. Understanding hardware requirements, model quantization, and local inference servers for optimal performance.

  • Demo Deployment: Methods for quickly deploying LLM applications for demonstration purposes using platforms like Hugging Face Spaces, Streamlit, or Gradio. These platforms offer easy-to-use interfaces for showcasing LLM capabilities with minimal setup.

  • Server Deployment: Strategies for deploying LLMs in production environments using cloud services or on-premises infrastructure. Includes load balancing, auto-scaling, monitoring, and high-availability configurations for reliable service delivery.

  • Edge Deployment: Techniques for deploying optimized LLMs on edge devices with limited resources. Focuses on model compression, efficient inference, and device-specific optimizations for mobile phones, IoT devices, and embedded systems.

πŸ“š References:


18. Securing LLMs

  • Prompt Injection: Malicious inputs designed to manipulate LLM behavior by bypassing safety measures. Attackers craft prompts that trick the model into performing unintended actions or revealing sensitive information.

  • Adversarial Attacks: Carefully crafted inputs that cause the model to produce incorrect or harmful outputs. These can exploit model vulnerabilities to generate misleading or inappropriate responses.

  • Jailbreaking: Methods to bypass model safety measures and restrictions. Attackers use creative prompting techniques to make models generate content that violates their intended constraints.

  • RLHF Attacks: Exploits targeting the reinforcement learning from human feedback mechanism. These attacks aim to manipulate the model's learned preferences and alignment.

πŸ“š References:


🧠 Happy learning! πŸ₯³

About

Metromap for learning GenAI

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors