A curated list of battle-tested, production-proven open-source AI models, libraries, infrastructure, and developer tools. Only elite-tier projects make this list.
by Boring Dystopia Development
- 🧬 1. Core Frameworks & Libraries
- 🧠 2. Open Foundation Models
- ⚡ 3. Inference Engines & Serving
- 🤖 4. Agentic AI & Multi-Agent Systems
- 🔍 5. Retrieval-Augmented Generation (RAG) & Knowledge
- 🎨 6. Generative Media Tools
- 🛠️ 7. Training & Fine-tuning Ecosystem
- 📊 8. MLOps / LLMOps & Production
- 📈 9. Evaluation, Benchmarks & Datasets
- 🛡️ 10. AI Safety, Alignment & Interpretability
- 🧩 11. Specialized Domains
- 🖥️ 12. User Interfaces & Self-hosted Platforms
- 🧪 13. Developer Tools & Integrations
- 📚 14. Resources & Learning
Core libraries and frameworks used to build, train, and run AI and machine learning systems.
- PyTorch
- Dynamic computation graphs, Pythonic API, dominant in research and production. The current standard for most frontier AI work.
- TensorFlow
- End-to-end platform with excellent production deployment, TPU support, and large-scale serving tools.
- JAX
+ Flax
- High-performance numerical computing with composable transformations (JIT, vmap, grad). Rising favorite for research and scientific ML.
- NumPyro
- Probabilistic programming with NumPy powered by JAX for autograd and JIT compilation. Bayesian modeling and inference at scale.
- Keras
- High-level, beginner-friendly API that now runs on multiple backends (TensorFlow, JAX, PyTorch). Perfect for rapid experimentation.
- tinygrad
- Minimalist deep learning framework with tiny code footprint. The "you like pytorch? you like micrograd? you love tinygrad!" philosophy - simple yet powerful.
- PyTorch Geometric
- Library for deep learning on irregular input data such as graphs, point clouds, and manifolds. Part of the PyTorch ecosystem.
- Burn
- Next-generation deep learning framework in Rust. Backend-agnostic with CPU, GPU, WebAssembly support.
- Candle (Hugging Face)
- Minimalist ML framework for Rust. PyTorch-like API with focus on performance and simplicity.
- linfa
- Comprehensive Rust ML toolkit with classical algorithms. scikit-learn equivalent for Rust with clustering, regression, and preprocessing.
- Flux.jl
- 100% pure-Julia ML stack with lightweight abstractions on top of native GPU and AD support. Elegant, hackable, and fully integrated with Julia's scientific computing ecosystem.
- Transformers (Hugging Face)
- The de facto standard library for pretrained NLP models. 1M+ models, 250,000+ downloads/day. BERT, GPT, Llama, Qwen, and hundreds more.
- sentence-transformers
- Classic library for sentence and image embeddings.
- tokenizers (Hugging Face)
- Fast state-of-the-art tokenizers for training and inference.
- Pandas
- The gold standard for data analysis and manipulation in Python.
- Polars
- Blazing-fast DataFrame library (Rust backend) - modern alternative to pandas for large-scale workloads.
- cuDF
- GPU DataFrame library from RAPIDS. Accelerates pandas workflows on NVIDIA GPUs with zero code changes using cuDF.pandas accelerator mode.
- Modin
- Parallel pandas DataFrames. Scale pandas workflows by changing a single line of code - distributes data and computation automatically.
- Dask
- Parallel computing for big data - scales pandas/NumPy/scikit-learn to clusters.
- NumPy
- Fundamental array computing library that powers almost every AI stack.
- SciPy
- Scientific computing algorithms (optimization, linear algebra, statistics, signal processing).
- NetworkX
- Creation, manipulation, and study of complex networks. The foundational graph analysis library for Python data science.
- scikit-learn
- Industry-standard library for traditional machine learning (classification, regression, clustering, pipelines).
- XGBoost
- Scalable, high-performance gradient boosting library. Still dominates Kaggle and tabular competitions.
- LightGBM
- Microsoft's ultra-fast gradient boosting framework, optimized for speed and memory.
- CatBoost
- Gradient boosting that handles categorical features natively with great out-of-the-box performance.
- sktime
- Unified framework for machine learning with time series. Scikit-learn compatible API for forecasting, classification, clustering, and anomaly detection.
- StatsForecast
- Lightning-fast statistical forecasting with ARIMA, ETS, CES, and Theta models. Optimized for high-performance time series workloads.
- Optuna
- Modern, define-by-run hyperparameter optimization with pruning and visualizations. Extremely popular in 2026.
- AutoGluon
- AWS AutoML toolkit for tabular, image, text, and multimodal data - state-of-the-art with almost zero code.
- FLAML
- Microsoft's fast & lightweight AutoML focused on efficiency and low compute.
- AutoKeras
- Neural architecture search on top of Keras.
- TPOT
- Genetic programming-based AutoML for full pipeline optimization.
- Hugging Face Accelerate
- Simple API to make training scripts run on any hardware (multi-GPU, TPU, mixed precision) with minimal code changes.
- DeepSpeed
- Microsoft's deep learning optimization library for extreme-scale training (ZeRO, offloading, MoE).
- Transformers
- Library of pretrained transformer models and utilities for text, vision, audio, and multimodal training and inference.
- FlashAttention
- Fast exact attention kernels that reduce memory usage and accelerate transformer training and inference.
- xFormers
- Optimized transformer building blocks and attention operators for PyTorch.
- PyTorch Lightning
- High-level wrapper for PyTorch that removes boilerplate and adds best practices.
- ONNX Runtime
- High-performance inference and training for ONNX models across hardware.
- einops
- Flexible, powerful tensor operations for readable and reliable code. Supports PyTorch, JAX, TensorFlow, NumPy, MLX.
- safetensors
- Simple, safe way to store and distribute tensors. Fast, secure alternative to pickle for model serialization.
- torchmetrics
- Machine learning metrics for distributed, scalable PyTorch applications. 80+ metrics with built-in distributed synchronization.
- torchao
- PyTorch native quantization and sparsity for training and inference. Drop-in optimizations for production deployment.
- SHAP
- Game theoretic approach to explain the output of any machine learning model. Industry standard for model interpretability.
Pretrained language, multimodal, speech, and video models with publicly available weights.
- Qwen3.6-Plus (Alibaba)
- Latest flagship series released April 2026 with 1M context window, agentic coding performance competitive with Claude 4.5 Opus, and enhanced multimodal capabilities.
- DeepSeek-V3.2 / R1 (DeepSeek)
- Mixture-of-Experts family with exceptional reasoning, math, and efficient large-scale inference.
- Gemma 4 (Google)
- Released April 2026 in four sizes (E2B, E4B, 26B MoE, 31B Dense). First major update in a year with Apache 2.0 license, complex logic, and agentic workflows.
- MiniMax-M2.1 / M1 (MiniMax)
- Open-weight MiniMax model line spanning long-context reasoning and agentic software tasks, with strong tool use and publicly released weights for local deployment.
- Kimi K2.5 (Moonshot AI)
- Frontier open-weight MoE model with 256K context, strong coding and reasoning performance, and native multimodal + tool-use support for agentic workflows.
- Mistral Large / Nemo / Small - High-performance model family with strong multilingual capability, tool use, and efficient deployment profiles.
- Phi-4 (Microsoft)
- Small but highly capable models optimized for reasoning, edge devices, and on-device inference. Includes Phi-4-reasoning variants with thinking capabilities.
- GLM-5 (Zhipu AI)
- Strong open model line with solid coding, reasoning, and agentic-task performance.
- OLMo 2 (Allen AI)
- Fully open-source LLMs (1B–32B) with complete transparency: models, data, training code, and logs. Designed by scientists, for scientists.
- Llama 4 (Meta)
- First native multimodal MoE open-source models (Scout: 10M context, Maverick: 400B+ params). Released April 2025 with enterprise-grade capabilities.
- DeepSeek-Coder-V2 / R1-Coder
- Best-in-class open coding model (236B MoE). Outperforms closed models on many code benchmarks.
- CodeLlama / CodeGemma
- Meta's specialized coding variants built on Llama. Still heavily used for fine-tuning.
- Qwen3-Coder-Next (Alibaba)
- Leading open coding model. Strong Pareto frontier for cost-effective agent deployment.
- StarCoder2 (BigCode)
- 15B model trained on 600+ programming languages. Community favorite for transparency.
- Granite Code Models (IBM)
- Family of open foundation models for code intelligence (3B–34B). Trained on 3-4T tokens of code data with strong performance across 116 programming languages.
- Qwen3-VL (Alibaba)
- Latest flagship VLM with native 256K context (expandable to 1M), visual agent capabilities, 3D grounding, and superior multimodal reasoning. Major leap over Qwen2.5-VL.
- InternVL3 (OpenGVLab)
- Native multimodal pretraining with mixed preference optimization (MPO). Superior perception and reasoning over InternVL 2.5, extends to GUI agents and 3D vision.
- GLM-4.5V / GLM-4.1V-Thinking (Zhipu AI)
- Strong multimodal reasoning with scalable reinforcement learning. Compares favorably with Gemini-2.5-Flash on benchmarks.
- LLaVA-OneVision
- Successor to LLaVA 1.6 with expanded capabilities across vision-language tasks.
- MiniCPM-V 2.6
- Handles images up to 1.8M pixels with top-tier OCR performance. Excellent for on-device deployment.
- Gemma 4 (Google)
- Multimodal model supporting vision-language input, optimized for efficiency, complex logic, and on-device use.
- Whisper (OpenAI → community forks)
- The gold-standard open speech-to-text model. Massive community fine-tunes available.
- OuteTTS / CosyVoice 2
- High-quality open TTS with natural prosody and multilingual support.
- Fish Speech / StyleTTS 2
- Zero-shot TTS with excellent voice cloning. Extremely popular in 2026.
- MusicGen / AudioCraft (Meta)
- Open music and audio generation models.
- VibeVoice (Microsoft)
- Open-source frontier voice AI with expressive, longform conversational speech synthesis. 7B parameter TTS with streaming support.
- Chatterbox (Resemble AI)
- State-of-the-art open TTS family with 350M parameter Turbo variant. Single-step generation with native paralinguistic tags for realistic dialogue.
- Dia (Nari Labs)
- 1.6B parameter TTS generating ultra-realistic dialogue in one pass with nonverbal communications (laughter, coughing). Emotion and tone control via audio conditioning.
- Kokoro
- Lightweight 82M parameter TTS with Apache-licensed weights. High-quality speech generation deployable anywhere from production to personal projects.
- Step-Audio (StepFun) - 130B-parameter production-ready audio LLM for intelligent speech interaction. Supports multilingual conversations (Chinese, English, Japanese), emotional tones, regional dialects (Cantonese, Sichuanese), adjustable speech rates, and prosodic styles including rap. Apache 2.0 licensed.
- Voxtral TTS (Mistral)
- 4B parameter state-of-the-art TTS with zero-shot voice cloning, 9-language support, and ~90ms time-to-first-audio for voice agents.
- Open-Sora / Open-Sora-Plan
- Fully open video generation model rivaling closed systems.
- CogVideoX (Zhipu AI / community)
- High-quality open text-to-video model (5B-12B).
- Mochi 1 (Genmo)
- 10B open video model with impressive motion and consistency.
- AnimateDiff / Stable Video Diffusion (community forks)
- Motion module ecosystem for turning images into video.
Inference runtimes, serving systems, and optimization tools for running models locally or in production.
- llama.cpp
- Pure C/C++ inference engine with GGUF format support. The gold standard for CPU/GPU/Apple Silicon on-device running. Includes llama-server for OpenAI-compatible API.
- Ollama
- Dead-simple local LLM runner with a one-line install, model registry, and OpenAI-compatible API.
- MLX
(Apple) - High-performance array framework + LLM inference optimized for Apple Silicon.
- MLC-LLM
- Deployment engine that compiles and runs LLMs across browsers, mobile devices, and local hardware.
- WebLLM
- High-performance in-browser LLM inference engine. Runs models directly in the browser with WebGPU acceleration.
- llama-cpp-python
- Official Python bindings for llama.cpp.
- KoboldCpp
- User-friendly llama.cpp fork focused on role-playing and creative writing.
- Potato OS
- Linux distribution for fully local AI inference on Raspberry Pi 5 and 4. Optimized for running open models at the edge.
- llm-d
- Kubernetes-native distributed LLM inference framework. Donated to CNCF by RedHat, Google, and IBM. Intelligent scheduling, KV-cache optimization, and state-of-the-art performance across accelerators.
- LMDeploy
- Toolkit for compressing, deploying, and serving LLMs from OpenMMLab. 4-bit inference with 2.4x higher performance than FP16, distributed multi-model serving across machines.
- vLLM**
- State-of-the-art serving engine with PagedAttention and continuous batching. Currently the fastest production-grade LLM server.
- Text Generation Inference (TGI)
- Hugging Face's production-ready Rust-based server.
- SGLang
- Next-gen serving framework with RadixAttention. Powers xAI's production workloads at 100K+ GPUs scale.
- TensorRT-LLM
- NVIDIA's official high-performance inference backend.
- Aphrodite Engine
- vLLM fork optimized for role-play and creative writing.
- Open Model Engine (OME)
- Kubernetes operator for LLM serving. GPU scheduling, model lifecycle management. Works with vLLM, SGLang, TensorRT-LLM.
- Triton Inference Server
- NVIDIA's production-grade open-source inference serving software. Supports multiple frameworks (TensorRT, PyTorch, ONNX) with optimized cloud and edge deployment.
- mistral.rs
- Fast, flexible Rust-native LLM inference engine built on Candle. Supports text, vision, audio, image generation, and embeddings with hardware-aware auto-tuning.
- KTransformers
- Flexible framework for heterogeneous CPU-GPU LLM inference and fine-tuning. Enables running large MoE models by offloading experts to CPU with BF16/FP8 precision support.
- llamafile
- Mozilla's single-file distributable LLM solution. Bundle model weights, inference engine, and runtime into one portable executable that runs on six OSes without installation.
- GGUF
(part of llama.cpp) - Modern quantized format that powers most local inference.
- bitsandbytes
- 8-bit and 4-bit optimizers + quantization.
- AutoAWQ
- Activation-aware Weight Quantization toolkit.
- AutoGPTQ
- GPTQ quantization framework.
- HQQ
- Half-Quadratic Quantization - ultra-fast method rising in 2026.
- ExLlamaV2
- Highly optimized CUDA kernels for 4-bit/8-bit inference.
- Optimum
- Hardware-specific acceleration and quantization.
Frameworks and platforms for building agent-based systems and multi-agent workflows.
- LangGraph
- Stateful, controllable agent orchestration.
- CrewAI
- Role-based agent framework.
- AutoGen (AG2)
- Flexible multi-agent conversation framework.
- DSPy
- Framework for programming language model pipelines with modules, optimizers, and evaluation loops.
- Semantic Kernel
- SDK for building and orchestrating AI agents and workflows across multiple programming languages.
- smolagents
- Lightweight agent framework centered on tool use and code-executing workflows.
- LangChain
- Foundational library for agents, chains, and memory.
- smolagents (Hugging Face)
- Minimalist agent library. Build agents in 3 lines of code with code-first action execution.
- Hermes Agent (NousResearch)
- The agent that grows with you. Autonomous server-side agent with persistent memory that learns and improves over time.
- Agno
- Build, run, and manage agentic software at scale. High-performance framework for multi-agent systems with memory, knowledge, and tools.
- Pydantic AI
- Python agent framework from the creators of Pydantic. Type-safe, structured outputs with dependency injection and streaming support.
- MetaGPT
- Simulates an entire "AI software company".
- CAMEL
- First and best multi-agent framework for building scalable agent systems. Apache 2.0 licensed with extensive tooling for agent communication and task automation.
- Swarm
- Lightweight multi-agent orchestration from OpenAI.
- Swarms
- Bleeding-edge enterprise multi-agent orchestration.
- Llama-Agents
- Async-first multi-agent system.
- Mastra
- TypeScript-first agent framework with built-in RAG, workflows, tool integrations, observability and observational memory.
- Deer-Flow (ByteDance)
- Open-source long-horizon SuperAgent harness that researches, codes, and creates. Handles tasks from minutes to hours with sandboxes, memories, tools, skills, subagents, and message gateway.
- OpenAI Agents SDK
- Production-ready lightweight framework for multi-agent workflows. The evolution of Swarm with enhanced orchestration capabilities and enterprise-grade features.
- OpenHands (ex-OpenDevin)
- Full-featured open-source AI software engineer.
- Goose
- Extensible on-machine AI agent for development tasks.
- OpenCode
- Terminal-native autonomous coding agent.
- Aider
- Command-line pair-programming agent.
- Pi (badlogic)
- Terminal coding agent with hash-anchored edits, LSP integration, subagents, MCP support, and package ecosystem.
- Mistral-Vibe (Mistral)
- Minimal CLI coding agent by Mistral. Lightweight, fast, and designed for local development workflows.
- Nanocoder (Nano-Collective)
- Beautiful local-first coding agent running in your terminal. Built for privacy and control with support for multiple AI providers via OpenRouter.
- Gemini CLI (Google)
- Open-source AI agent that brings Gemini's power directly into your terminal. Supports code generation, shell execution, and file editing with full Apache 2.0 licensing.
- Langflow
- Visual low-code platform for agentic workflows.
- Dify
- Production-ready agentic workflow platform.
- OWL (camel-ai/owl)
- Advanced multi-agent collaboration system.
- SuperAGI
- Dev-first autonomous AI agent platform.
- AI-Scientist-v2 (SakanaAI)
- Workshop-level automated scientific discovery via agentic tree search. Generates novel research ideas, runs experiments, and writes papers.
- PraisonAI
- 24/7 AI employee team for automating complex challenges. Low-code multi-agent framework with handoffs, guardrails, memory, RAG, and 100+ LLM providers.
- Agent-S (Simular AI)
- Open agentic framework that uses computers like a human. SOTA on OSWorld benchmark (72.6%) for GUI automation and computer control.
- Letta (ex-MemGPT)
- Platform for building stateful agents with advanced memory that learn and self-improve over time.
- Mem0
- Universal memory layer for AI agents. Persistent, multi-session memory across models and environments.
- Forgetful
- MCP server for persistent AI agent memory. Stores atomic single-concept notes and auto-links them into a knowledge graph via semantic similarity. SQLite or PostgreSQL.
- Hindsight
- State-of-the-art long-term memory for AI agents by Vectorize. Fully self-hosted, MIT-licensed, with integrations for LangChain, CrewAI, LlamaIndex, Vercel AI SDK, and more.
Retrieval systems, vector databases, embedding models, and related tooling for RAG pipelines.
- Chroma
- Most popular open-source embedding database.
- Qdrant
- High-performance vector search engine in Rust.
- Weaviate
- GraphQL-native vector search engine.
- Milvus
- Scalable cloud-native vector database.
- Faiss
- Similarity search and clustering library for dense vectors with CPU and GPU implementations.
- NornicDB
- Golang Low-latency graph + vector hybrid retrieval, Neo4j and qDrant driver compatible.
- LanceDB
- Serverless vector DB optimized for multimodal data.
- pgvector
- PostgreSQL extension for vector similarity search.
- BGE (FlagEmbedding)
- BAAI's best-in-class embedding family.
- E5 (Microsoft)
- High-performance text embeddings for retrieval.
- Nomic Embed
- Open embedding model with long-context support and reproducible training.
- LlamaIndex
- Full-featured RAG pipeline with advanced indexing.
- Haystack
- End-to-end NLP and RAG framework.
- RAGFlow
- Deep-document-understanding RAG engine.
- GraphRAG (Microsoft)
- Knowledge-graph-based RAG.
- Verba (Weaviate)
- Golden RAG frontend with intuitive UI for retrieval and exploration.
- RAGatouille
- Advanced retrieval tools with late interaction models (ColBERT).
- Docling
- Document processing toolkit for turning PDFs and other files into structured data for GenAI workflows.
- Unstructured
- Best-in-class document preprocessing.
- ColPali / ColQwen
- Vision-language models for document retrieval.
- LightRAG
- Graph-based RAG with dual-level retrieval system. Simple and fast with comprehensive knowledge discovery (EMNLP 2025).
- RAG-Anything
- All-in-One Multimodal RAG system for seamless processing of text, images, tables, and equations. Built on LightRAG.
- txtai
- All-in-one AI framework for semantic search, LLM orchestration and language model workflows. Embeddings database with customizable pipelines.
- Infinity
- High-throughput, low-latency serving engine for text-embeddings, reranking, CLIP, and ColPali. OpenAI-compatible API.
- Crawl4AI
- LLM-friendly web crawler that turns websites into clean Markdown for RAG and agentic workflows.
- Lightpanda
- Machine-first headless browser in Zig; rendering-free and ultra-lightweight for AI agent browsing.
- Paperless-AI
- Automated document analyzer for Paperless-ngx with RAG-powered semantic search across your document archive.
Open-source models and applications for image, video, audio, and 3D generation and editing.
- ComfyUI
- Node-based visual workflow editor for Stable Diffusion, FLUX, etc.
- Stable Diffusion WebUI Forge - Neo
- Actively maintained Forge-based Stable Diffusion web UI with the familiar extension-driven workflow.
- Fooocus
- Midjourney-style UI with beautiful out-of-the-box results.
- FLUX.1 / FLUX.2 (Black Forest Labs)
- State-of-the-art open text-to-image model family.
- Diffusers
- PyTorch library for diffusion pipelines spanning image, video, and audio generation.
- InvokeAI
- Full-featured creative studio.
- Stable Diffusion 3.5 (Stability AI)
- Latest open-weight diffusion model.
- PowerPaint (OpenMMLab)
- Versatile image inpainting model supporting text-guided inpainting, object removal, and outpainting (ECCV 2024).
- Real-ESRGAN
- Practical algorithms for general image/video super-resolution and restoration.
- Wan2.2 (Alibaba)
- Leading open Mixture-of-Experts text-to-video model.
- HunyuanVideo (Tencent)
- 13B-parameter systematic video generation framework. Leading quality among open models.
- SkyReels V2/V3 (Skywork)
- First open-source infinite-length film generative model using AutoRegressive Diffusion-Forcing.
- Mochi 1 (Genmo)
- 10B-parameter open video model.
- LTX-Video (Lightricks)
- Fast native 4K video generation.
- Open-Sora 2.0 (HPC-AI Tech)
- Fully open training + inference pipeline.
- Stable Video Diffusion (Stability AI)
- Official image-to-video and text-to-video implementation within Stability AI's generative models repository.
- AnimateDiff
- Motion module ecosystem.
- AudioCraft / MusicGen (Meta)
- Controllable text-to-music and audio models.
- ACE-Step 1.5
- Local-first music generation model with broad hardware support across Mac, AMD, Intel, and CUDA devices.
- Fish Speech
- Zero-shot TTS and voice cloning.
- CosyVoice 2
- Natural multilingual TTS with emotional control.
- StyleTTS 2
- Expressive zero-shot TTS.
- OuteTTS
- High-quality open TTS.
- RVC (Retrieval-based Voice Conversion)
- Gold standard for real-time voice cloning.
- Amphion
- Comprehensive toolkit for Audio, Music, and Speech Generation (9.7K stars).
- YuE
- Open full-song generation model producing high-quality music with vocals (similar capabilities to Suno.ai but open-source).
- OpenVoice
- Instant voice cloning by MIT and MyShell with accurate tone color cloning and style control.
- Hunyuan3D-2 (Tencent)
- State-of-the-art open image-to-3D and text-to-3D.
- Trellis (Microsoft)
- Structured 3D latents for high-quality generation.
- Wonder3D
- Fast multi-view consistent 3D generation.
- TripoSR
- Lightning-fast 3D reconstruction.
- Nerfstudio
- End-to-end framework for training, rendering, and experimenting with NeRF and related 3D scene representations.
- gsplat (3D Gaussian Splatting tools)
- High-performance 3D Gaussian Splatting library.
Tools for model training, fine-tuning, synthetic data generation, and distributed training.
- LLaMA-Factory
- One-stop unified framework for SFT, DPO, ORPO, KTO with web UI.
- Axolotl
- YAML-driven full pipeline for SFT, DPO, GRPO.
- Unsloth
- 2× faster, 70% less memory fine-tuning.
- LitGPT
- Clean from-scratch implementations of 20+ LLMs.
- torchtune
- PyTorch-native library for post-training, fine-tuning, and experimentation with LLMs.
- TRL (Transformers Reinforcement Learning)
- Official library for RLHF, SFT, DPO, ORPO.
- PEFT (Parameter-Efficient Fine-Tuning)
- Official library with LoRA, QLoRA, DoRA, etc.
- Liger Kernel
- Ultra-fast custom kernels for training speedup.
- MergeKit
- Advanced model merging tools.
- distilabel
- End-to-end pipeline for synthetic instruction data.
- Data-Juicer
- High-performance data processing for LLM training.
- Argilla
- Open-source data labeling + synthetic data platform.
- SDV (Synthetic Data Vault)
- High-fidelity tabular and relational synthetic data.
- DeepSpeed
- Extreme-scale training optimizations.
- Colossal-AI
- Unified system for 100B+ models.
- Megatron-LM
- Distributed training framework and reference codebase for large transformer models at scale.
- Ray Train
- Scalable distributed training.
Tooling for tracking, deploying, monitoring, and operating AI systems in production.
- MLflow
- End-to-end open platform for the ML/LLM lifecycle.
- DVC (Data Version Control)
- Git-like versioning for data and models.
- ClearML
- Open-source platform for experiment tracking, orchestration, data management, and model serving.
- Weights & Biases Weave
- Open-source tracing and experiment tracking.
- BentoML
- Unified framework to build, ship, and scale AI apps.
- Ray Serve
- Scalable model serving library.
- ZenML
- Pipeline and orchestration framework for taking ML and LLM systems from development to production.
- Kubeflow
- Kubernetes-native ML/LLM platform.
- KServe
- Kubernetes-based model serving.
- Langfuse
- #1 open-source LLM observability platform.
- Phoenix (Arize)
- AI observability & evaluation platform.
- Evidently
- ML & LLM monitoring framework.
- Opik (Comet)
- Production-ready LLM evaluation platform.
- LiteLLM
- AI Gateway to call 100+ LLM APIs in OpenAI format with unified cost tracking, guardrails, load balancing, and logging.
- OpenLIT
- OpenTelemetry-native LLM observability platform with GPU monitoring, evaluations, prompt management, and guardrails.
- OpenLLMetry (Traceloop)
- Open-source observability for GenAI/LLM applications based on OpenTelemetry with 25+ integration backends.
- Agenta
- Open-source LLMOps platform combining prompt playground, prompt management, LLM evaluation, and observability.
- Helicone
- Open-source LLM observability with request logging, caching, rate limiting, and cost analytics.
- NVIDIA NeMo Guardrails
- Programmable guardrails toolkit.
- Guardrails AI
- Structure and validation for LLM outputs.
- LLM Guard
- Comprehensive input/output scanner.
- Director-AI
- Real-time LLM hallucination guardrail with NLI + RAG fact-checking and token-level streaming halt.
- LlamaGuard (Meta)
- Open safety classifier models.
- Garak
- LLM vulnerability scanner.
- Promptfoo
- LLM testing and red-teaming framework.
Benchmarks, evaluation frameworks, datasets, and supporting tools for model assessment.
- lm-evaluation-harness (EleutherAI)
- De-facto standard for generative model evaluation.
- HELM (Stanford)
- Holistic Evaluation of Language Models.
- GAIA - Real-world multi-step agentic benchmark.
- LiveCodeBench
- Contamination-free coding benchmark.
- MMLU-Pro / GPQA
- Hardened expert-level benchmarks.
- OpenCompass
- Evaluation platform for benchmarking language and multimodal models across large benchmark suites.
- SWE-rebench (Nebius) - Continuously updated benchmark with 21,000+ real-world SWE tasks for evaluating agentic LLMs. Decontaminated, mined from GitHub.
- DeepEval
- The "Pytest for LLMs".
- RAGAs
- End-to-end RAG evaluation framework.
- Lighteval
- Evaluation toolkit for LLMs across multiple backends with reusable tasks, metrics, and result tracking.
- Hugging Face Evaluate
- Standardized evaluation metrics.
- Hugging Face Datasets
- Largest open repository of datasets.
- FineWeb / FineWeb-2 (Hugging Face) - Curated 15T+ token web dataset for pre-training.
- RedPajama
- Clean, reproducible Llama training data mix.
- OSWorld
- Multimodal agent benchmark dataset.
Tools for alignment, interpretability, safety evaluation, and adversarial testing.
- Safe-RLHF
- Safe reinforcement learning from human feedback.
- Alignment Handbook
- Complete recipes for full-stack alignment.
- OpenRLHF
- High-performance distributed RLHF framework.
- TransformerLens
- Gold-standard for mechanistic interpretability.
- nnsight
- Scalable library for intervening on neural networks.
- SAELens
- Sparse autoencoders for interpretable features.
- Captum
- PyTorch's official interpretability library.
- Garak
- Automated LLM vulnerability scanner.
- PyRIT (Python Risk Identification Tool)
- Framework for custom adversarial testing.
- Promptfoo
- Systematic prompt testing and red-teaming.
- LLM Guard
- Input/output scanner for LLMs.
- OpenCV
- World's most widely used computer vision library.
- Ultralytics YOLO
- State-of-the-art real-time object detection.
- Detectron2
- High-performance object detection library.
- SAM 2
- Promptable image and video segmentation model with released checkpoints and training code.
- Kornia
- Differentiable computer vision library.
- MediaPipe
- Cross-platform multimodal pipelines.
- Stable-Baselines3
- Production-ready RL algorithms.
- CleanRL
- Single-file readable RL implementations.
- JaxMARL
- Multi-Agent Reinforcement Learning with JAX. Accelerated environments and baselines.
- Isaac Lab
- GPU-accelerated robot learning framework.
- Gymnasium (ex-OpenAI Gym)
- Standard RL environment API.
- Time Series Library (TSLib)
- Comprehensive benchmark for time-series models.
- Chronos (Amazon)
- Pretrained foundation models for time-series forecasting.
- Darts
- Easy-to-use time-series forecasting library.
- AutoTS
- Automated time series forecasting with broad model selection, ensembling, anomaly detection, and holiday effects. Designed for production deployment with minimal setup.
- TensorFlow Lite
- Lightweight on-device ML.
- ONNX Runtime
- Cross-platform high-performance inference.
- ExecuTorch
- PyTorch runtime and toolchain for deploying AI models on mobile, embedded, and edge devices.
- OpenVINO
- Intel's toolkit for edge deployment.
- MicroTVM (Apache TVM)
- Compiler stack for microcontrollers.
- OpenClaw
- Local-first personal AI assistant with multi-channel integrations and full agentic task execution.
- Open WebUI
- Most popular self-hosted ChatGPT-style interface.
- text-generation-webui
- Web UI for running local LLMs with multiple backends, extensions, and model formats.
- LobeChat
- Sleek modern chat UI.
- LibreChat
- Feature-packed multi-LLM interface.
- HuggingChat (self-hosted)
- Official open-source codebase for HuggingChat.
- Khoj
- Self-hostable personal AI assistant for search, chat, automation, and workflows over local and web data.
- Newelle
- GNOME/Linux desktop virtual assistant with integrated file editor, global hotkeys, and profile manager.
- AnythingLLM
- All-in-one RAG + agents platform.
- Dify
- Complete AI application platform with visual builder.
- Langflow
- Visual low-code platform for LangChain flows.
- Flowise
- Drag-and-drop LLM app builder.
- GPT4All
- Privacy-first local desktop chatbot.
- Jan
- Local-first AI app framework.
- SillyTavern
- Highly customizable role-playing frontend.
- Continue
- Open-source AI coding autopilot for VS Code & JetBrains.
- Tabby
- Self-hosted AI coding assistant.
- Cline
- Open-source IDE coding agent that can edit files, run commands, and use tools with user approval.
- Open Interpreter
- Lets LLMs run code locally.
- Roo Code
- Open-source editor-based coding agent with multiple modes and tool integrations.
- Aider
- Terminal-based AI pair programmer.
- llama.vim
- Local LLM-powered code completion plugin for Vim/Neovim using llama.cpp. Fast, privacy-first, no API key needed.
- CodeCompanion.nvim
- AI-powered coding assistant for Neovim. Inline code generation, chat, actions, and tool use with support for multiple LLM providers.
- Continue VS Code / JetBrains
- Most installed open-source AI extension.
- Jupyter AI
- Chat and code generation inside notebooks.
- Assistant UI
- React/TypeScript library for building production-grade AI chat interfaces. Drop-in components for streaming messages, tool calls, and multi-modal inputs.
- Promptfoo
- Systematic LLM testing framework.
- DeepEval
- LLM unit-testing framework.
- Garak
- LLM vulnerability scanner.
- Phoenix (Arize)
- AI observability for development.
- Papers with Code - Definitive database linking papers to open code and datasets.
- Hugging Face Papers - Daily-updated feed of the latest arXiv papers with open weights.
- Open LLM Leaderboard (Hugging Face) - Real-time ranking of open models.
- Hugging Face Discussions - Largest open AI forum.
- r/LocalLLaMA - Go-to subreddit for local/open-source LLM topics.
- Hugging Face Course - Free hands-on courses using only open models.
- Fast.ai
- Legendary practical deep learning course.
- LangChain Academy - Free courses on agents and RAG.
- ComfyUI Examples & Workflows
- Massive collection of generative media workflows.
- PyTorch Examples
- Official tutorials: image classification, NLP, reinforcement learning.
- TensorFlow Tutorials - Official guides for beginners to advanced users.
- Hugging Face Transformers Notebooks
- Run Transformers, Datasets, and more in Colab.
- Deep Learning Examples (NVIDIA)
- Production-quality reference implementations.
Contributions are highly welcome! Please read the CONTRIBUTING.md for guidelines (quality standards, formatting, license requirements, etc.).
- Only OSI-approved licenses
- Projects must be actively maintained (commits in last 6 months)
- High-quality, well-documented, real adoption
This list itself is licensed under CC0 1.0 Universal. Feel free to use it for any purpose.
Made with ❤️ for the open-source AI community. Star the repo if you find it useful - it helps more people discover the best open tools!