Skip to content

alvinreal/awesome-opensource-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

108 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Open Source AI

Awesome Open Source AI - Elite Tier

A curated list of battle-tested, production-proven open-source AI models, libraries, infrastructure, and developer tools. Only elite-tier projects make this list.

Awesome PRs Welcome License: CC0-1.0

by Boring Dystopia Development

boringdystopia.ai   X @alvinunreal   Telegram Join channel


📋 Contents


🧬 1. Core Frameworks & Libraries

Core libraries and frameworks used to build, train, and run AI and machine learning systems.

Deep Learning Frameworks

  • PyTorch GitHub stars - Dynamic computation graphs, Pythonic API, dominant in research and production. The current standard for most frontier AI work.
  • TensorFlow GitHub stars - End-to-end platform with excellent production deployment, TPU support, and large-scale serving tools.
  • JAX GitHub stars + Flax GitHub stars - High-performance numerical computing with composable transformations (JIT, vmap, grad). Rising favorite for research and scientific ML.
  • NumPyro GitHub stars - Probabilistic programming with NumPy powered by JAX for autograd and JIT compilation. Bayesian modeling and inference at scale.
  • Keras GitHub stars - High-level, beginner-friendly API that now runs on multiple backends (TensorFlow, JAX, PyTorch). Perfect for rapid experimentation.
  • tinygrad GitHub stars - Minimalist deep learning framework with tiny code footprint. The "you like pytorch? you like micrograd? you love tinygrad!" philosophy - simple yet powerful.
  • PyTorch Geometric GitHub stars - Library for deep learning on irregular input data such as graphs, point clouds, and manifolds. Part of the PyTorch ecosystem.

Rust ML Frameworks

  • Burn GitHub stars - Next-generation deep learning framework in Rust. Backend-agnostic with CPU, GPU, WebAssembly support.
  • Candle (Hugging Face) GitHub stars - Minimalist ML framework for Rust. PyTorch-like API with focus on performance and simplicity.
  • linfa GitHub stars - Comprehensive Rust ML toolkit with classical algorithms. scikit-learn equivalent for Rust with clustering, regression, and preprocessing.

Julia ML Frameworks

  • Flux.jl GitHub stars - 100% pure-Julia ML stack with lightweight abstractions on top of native GPU and AD support. Elegant, hackable, and fully integrated with Julia's scientific computing ecosystem.

NLP & Transformers

Data Processing & Manipulation

  • Pandas GitHub stars - The gold standard for data analysis and manipulation in Python.
  • Polars GitHub stars - Blazing-fast DataFrame library (Rust backend) - modern alternative to pandas for large-scale workloads.
  • cuDF GitHub stars - GPU DataFrame library from RAPIDS. Accelerates pandas workflows on NVIDIA GPUs with zero code changes using cuDF.pandas accelerator mode.
  • Modin GitHub stars - Parallel pandas DataFrames. Scale pandas workflows by changing a single line of code - distributes data and computation automatically.
  • Dask GitHub stars - Parallel computing for big data - scales pandas/NumPy/scikit-learn to clusters.
  • NumPy GitHub stars - Fundamental array computing library that powers almost every AI stack.
  • SciPy GitHub stars - Scientific computing algorithms (optimization, linear algebra, statistics, signal processing).
  • NetworkX GitHub stars - Creation, manipulation, and study of complex networks. The foundational graph analysis library for Python data science.

Classical ML & Gradient Boosting

  • scikit-learn GitHub stars - Industry-standard library for traditional machine learning (classification, regression, clustering, pipelines).
  • XGBoost GitHub stars - Scalable, high-performance gradient boosting library. Still dominates Kaggle and tabular competitions.
  • LightGBM GitHub stars - Microsoft's ultra-fast gradient boosting framework, optimized for speed and memory.
  • CatBoost GitHub stars - Gradient boosting that handles categorical features natively with great out-of-the-box performance.
  • sktime GitHub stars - Unified framework for machine learning with time series. Scikit-learn compatible API for forecasting, classification, clustering, and anomaly detection.
  • StatsForecast GitHub stars - Lightning-fast statistical forecasting with ARIMA, ETS, CES, and Theta models. Optimized for high-performance time series workloads.

AutoML & Hyperparameter Optimization

  • Optuna GitHub stars - Modern, define-by-run hyperparameter optimization with pruning and visualizations. Extremely popular in 2026.
  • AutoGluon GitHub stars - AWS AutoML toolkit for tabular, image, text, and multimodal data - state-of-the-art with almost zero code.
  • FLAML GitHub stars - Microsoft's fast & lightweight AutoML focused on efficiency and low compute.
  • AutoKeras GitHub stars - Neural architecture search on top of Keras.
  • TPOT GitHub stars - Genetic programming-based AutoML for full pipeline optimization.

Model Training & Optimization Utilities

  • Hugging Face Accelerate GitHub stars - Simple API to make training scripts run on any hardware (multi-GPU, TPU, mixed precision) with minimal code changes.
  • DeepSpeed GitHub stars - Microsoft's deep learning optimization library for extreme-scale training (ZeRO, offloading, MoE).
  • Transformers GitHub stars - Library of pretrained transformer models and utilities for text, vision, audio, and multimodal training and inference.
  • FlashAttention GitHub stars - Fast exact attention kernels that reduce memory usage and accelerate transformer training and inference.
  • xFormers GitHub stars - Optimized transformer building blocks and attention operators for PyTorch.
  • PyTorch Lightning GitHub stars - High-level wrapper for PyTorch that removes boilerplate and adds best practices.
  • ONNX Runtime GitHub stars - High-performance inference and training for ONNX models across hardware.
  • einops GitHub stars - Flexible, powerful tensor operations for readable and reliable code. Supports PyTorch, JAX, TensorFlow, NumPy, MLX.
  • safetensors GitHub stars - Simple, safe way to store and distribute tensors. Fast, secure alternative to pickle for model serialization.
  • torchmetrics GitHub stars - Machine learning metrics for distributed, scalable PyTorch applications. 80+ metrics with built-in distributed synchronization.
  • torchao GitHub stars - PyTorch native quantization and sparsity for training and inference. Drop-in optimizations for production deployment.
  • SHAP GitHub stars - Game theoretic approach to explain the output of any machine learning model. Industry standard for model interpretability.

🧠 2. Open Foundation Models

Pretrained language, multimodal, speech, and video models with publicly available weights.

Large Language Models (Base + Chat)

  • Qwen3.6-Plus (Alibaba) GitHub stars - Latest flagship series released April 2026 with 1M context window, agentic coding performance competitive with Claude 4.5 Opus, and enhanced multimodal capabilities.
  • DeepSeek-V3.2 / R1 (DeepSeek) GitHub stars - Mixture-of-Experts family with exceptional reasoning, math, and efficient large-scale inference.
  • Gemma 4 (Google) GitHub stars - Released April 2026 in four sizes (E2B, E4B, 26B MoE, 31B Dense). First major update in a year with Apache 2.0 license, complex logic, and agentic workflows.
  • MiniMax-M2.1 / M1 (MiniMax) GitHub stars - Open-weight MiniMax model line spanning long-context reasoning and agentic software tasks, with strong tool use and publicly released weights for local deployment.
  • Kimi K2.5 (Moonshot AI) GitHub stars - Frontier open-weight MoE model with 256K context, strong coding and reasoning performance, and native multimodal + tool-use support for agentic workflows.
  • Mistral Large / Nemo / Small - High-performance model family with strong multilingual capability, tool use, and efficient deployment profiles.
  • Phi-4 (Microsoft) GitHub stars - Small but highly capable models optimized for reasoning, edge devices, and on-device inference. Includes Phi-4-reasoning variants with thinking capabilities.
  • GLM-5 (Zhipu AI) GitHub stars - Strong open model line with solid coding, reasoning, and agentic-task performance.
  • OLMo 2 (Allen AI) GitHub stars - Fully open-source LLMs (1B–32B) with complete transparency: models, data, training code, and logs. Designed by scientists, for scientists.
  • Llama 4 (Meta) GitHub stars - First native multimodal MoE open-source models (Scout: 10M context, Maverick: 400B+ params). Released April 2025 with enterprise-grade capabilities.

Coding & Reasoning Models

  • DeepSeek-Coder-V2 / R1-Coder GitHub stars - Best-in-class open coding model (236B MoE). Outperforms closed models on many code benchmarks.
  • CodeLlama / CodeGemma GitHub stars - Meta's specialized coding variants built on Llama. Still heavily used for fine-tuning.
  • Qwen3-Coder-Next (Alibaba) GitHub stars - Leading open coding model. Strong Pareto frontier for cost-effective agent deployment.
  • StarCoder2 (BigCode) GitHub stars - 15B model trained on 600+ programming languages. Community favorite for transparency.
  • Granite Code Models (IBM) GitHub stars - Family of open foundation models for code intelligence (3B–34B). Trained on 3-4T tokens of code data with strong performance across 116 programming languages.

Multimodal Models (Vision + Language)

  • Qwen3-VL (Alibaba) GitHub stars - Latest flagship VLM with native 256K context (expandable to 1M), visual agent capabilities, 3D grounding, and superior multimodal reasoning. Major leap over Qwen2.5-VL.
  • InternVL3 (OpenGVLab) GitHub stars - Native multimodal pretraining with mixed preference optimization (MPO). Superior perception and reasoning over InternVL 2.5, extends to GUI agents and 3D vision.
  • GLM-4.5V / GLM-4.1V-Thinking (Zhipu AI) GitHub stars - Strong multimodal reasoning with scalable reinforcement learning. Compares favorably with Gemini-2.5-Flash on benchmarks.
  • LLaVA-OneVision GitHub stars - Successor to LLaVA 1.6 with expanded capabilities across vision-language tasks.
  • MiniCPM-V 2.6 GitHub stars - Handles images up to 1.8M pixels with top-tier OCR performance. Excellent for on-device deployment.
  • Gemma 4 (Google) GitHub stars - Multimodal model supporting vision-language input, optimized for efficiency, complex logic, and on-device use.

Speech & Audio Models (TTS, STT, Music)

  • Whisper (OpenAI → community forks) GitHub stars - The gold-standard open speech-to-text model. Massive community fine-tunes available.
  • OuteTTS / CosyVoice 2 GitHub stars - High-quality open TTS with natural prosody and multilingual support.
  • Fish Speech / StyleTTS 2 GitHub stars - Zero-shot TTS with excellent voice cloning. Extremely popular in 2026.
  • MusicGen / AudioCraft (Meta) GitHub stars - Open music and audio generation models.
  • VibeVoice (Microsoft) GitHub stars - Open-source frontier voice AI with expressive, longform conversational speech synthesis. 7B parameter TTS with streaming support.
  • Chatterbox (Resemble AI) GitHub stars - State-of-the-art open TTS family with 350M parameter Turbo variant. Single-step generation with native paralinguistic tags for realistic dialogue.
  • Dia (Nari Labs) GitHub stars - 1.6B parameter TTS generating ultra-realistic dialogue in one pass with nonverbal communications (laughter, coughing). Emotion and tone control via audio conditioning.
  • Kokoro GitHub stars - Lightweight 82M parameter TTS with Apache-licensed weights. High-quality speech generation deployable anywhere from production to personal projects.
  • Step-Audio (StepFun) - 130B-parameter production-ready audio LLM for intelligent speech interaction. Supports multilingual conversations (Chinese, English, Japanese), emotional tones, regional dialects (Cantonese, Sichuanese), adjustable speech rates, and prosodic styles including rap. Apache 2.0 licensed.
  • Voxtral TTS (Mistral) GitHub stars - 4B parameter state-of-the-art TTS with zero-shot voice cloning, 9-language support, and ~90ms time-to-first-audio for voice agents.

Video & Animation Models


⚡ 3. Inference Engines & Serving

Inference runtimes, serving systems, and optimization tools for running models locally or in production.

Local / On-device Inference

  • llama.cpp GitHub stars - Pure C/C++ inference engine with GGUF format support. The gold standard for CPU/GPU/Apple Silicon on-device running. Includes llama-server for OpenAI-compatible API.
  • Ollama GitHub stars - Dead-simple local LLM runner with a one-line install, model registry, and OpenAI-compatible API.
  • MLX GitHub stars (Apple) - High-performance array framework + LLM inference optimized for Apple Silicon.
  • MLC-LLM GitHub stars - Deployment engine that compiles and runs LLMs across browsers, mobile devices, and local hardware.
  • WebLLM GitHub stars - High-performance in-browser LLM inference engine. Runs models directly in the browser with WebGPU acceleration.
  • llama-cpp-python GitHub stars - Official Python bindings for llama.cpp.
  • KoboldCpp GitHub stars - User-friendly llama.cpp fork focused on role-playing and creative writing.
  • Potato OS GitHub stars - Linux distribution for fully local AI inference on Raspberry Pi 5 and 4. Optimized for running open models at the edge.

High-performance Serving & API Servers

  • llm-d GitHub stars - Kubernetes-native distributed LLM inference framework. Donated to CNCF by RedHat, Google, and IBM. Intelligent scheduling, KV-cache optimization, and state-of-the-art performance across accelerators.
  • LMDeploy GitHub stars - Toolkit for compressing, deploying, and serving LLMs from OpenMMLab. 4-bit inference with 2.4x higher performance than FP16, distributed multi-model serving across machines.
  • vLLM** GitHub stars - State-of-the-art serving engine with PagedAttention and continuous batching. Currently the fastest production-grade LLM server.
  • Text Generation Inference (TGI) GitHub stars - Hugging Face's production-ready Rust-based server.
  • SGLang GitHub stars - Next-gen serving framework with RadixAttention. Powers xAI's production workloads at 100K+ GPUs scale.
  • TensorRT-LLM GitHub stars - NVIDIA's official high-performance inference backend.
  • Aphrodite Engine GitHub stars - vLLM fork optimized for role-play and creative writing.
  • Open Model Engine (OME) GitHub stars - Kubernetes operator for LLM serving. GPU scheduling, model lifecycle management. Works with vLLM, SGLang, TensorRT-LLM.
  • Triton Inference Server GitHub stars - NVIDIA's production-grade open-source inference serving software. Supports multiple frameworks (TensorRT, PyTorch, ONNX) with optimized cloud and edge deployment.
  • mistral.rs GitHub stars - Fast, flexible Rust-native LLM inference engine built on Candle. Supports text, vision, audio, image generation, and embeddings with hardware-aware auto-tuning.
  • KTransformers GitHub stars - Flexible framework for heterogeneous CPU-GPU LLM inference and fine-tuning. Enables running large MoE models by offloading experts to CPU with BF16/FP8 precision support.
  • llamafile GitHub stars - Mozilla's single-file distributable LLM solution. Bundle model weights, inference engine, and runtime into one portable executable that runs on six OSes without installation.

Quantization, Distillation & Optimization

  • GGUF GitHub stars (part of llama.cpp) - Modern quantized format that powers most local inference.
  • bitsandbytes GitHub stars - 8-bit and 4-bit optimizers + quantization.
  • AutoAWQ GitHub stars - Activation-aware Weight Quantization toolkit.
  • AutoGPTQ GitHub stars - GPTQ quantization framework.
  • HQQ GitHub stars - Half-Quadratic Quantization - ultra-fast method rising in 2026.
  • ExLlamaV2 GitHub stars - Highly optimized CUDA kernels for 4-bit/8-bit inference.
  • Optimum GitHub stars - Hardware-specific acceleration and quantization.

🤖 4. Agentic AI & Multi-Agent Systems

Frameworks and platforms for building agent-based systems and multi-agent workflows.

Single-Agent Frameworks

  • LangGraph GitHub stars - Stateful, controllable agent orchestration.
  • CrewAI GitHub stars - Role-based agent framework.
  • AutoGen (AG2) GitHub stars - Flexible multi-agent conversation framework.
  • DSPy GitHub stars - Framework for programming language model pipelines with modules, optimizers, and evaluation loops.
  • Semantic Kernel GitHub stars - SDK for building and orchestrating AI agents and workflows across multiple programming languages.
  • smolagents GitHub stars - Lightweight agent framework centered on tool use and code-executing workflows.
  • LangChain GitHub stars - Foundational library for agents, chains, and memory.
  • smolagents (Hugging Face) GitHub stars - Minimalist agent library. Build agents in 3 lines of code with code-first action execution.
  • Hermes Agent (NousResearch) GitHub stars - The agent that grows with you. Autonomous server-side agent with persistent memory that learns and improves over time.
  • Agno GitHub stars - Build, run, and manage agentic software at scale. High-performance framework for multi-agent systems with memory, knowledge, and tools.
  • Pydantic AI GitHub stars - Python agent framework from the creators of Pydantic. Type-safe, structured outputs with dependency injection and streaming support.

Multi-Agent Orchestration

  • MetaGPT GitHub stars - Simulates an entire "AI software company".
  • CAMEL GitHub stars - First and best multi-agent framework for building scalable agent systems. Apache 2.0 licensed with extensive tooling for agent communication and task automation.
  • Swarm GitHub stars - Lightweight multi-agent orchestration from OpenAI.
  • Swarms GitHub stars - Bleeding-edge enterprise multi-agent orchestration.
  • Llama-Agents GitHub stars - Async-first multi-agent system.
  • Mastra GitHub stars - TypeScript-first agent framework with built-in RAG, workflows, tool integrations, observability and observational memory.
  • Deer-Flow (ByteDance) GitHub stars - Open-source long-horizon SuperAgent harness that researches, codes, and creates. Handles tasks from minutes to hours with sandboxes, memories, tools, skills, subagents, and message gateway.
  • OpenAI Agents SDK GitHub stars - Production-ready lightweight framework for multi-agent workflows. The evolution of Swarm with enhanced orchestration capabilities and enterprise-grade features.

Autonomous Coding Agents

  • OpenHands (ex-OpenDevin) GitHub stars - Full-featured open-source AI software engineer.
  • Goose GitHub stars - Extensible on-machine AI agent for development tasks.
  • OpenCode GitHub stars - Terminal-native autonomous coding agent.
  • Aider GitHub stars - Command-line pair-programming agent.
  • Pi (badlogic) GitHub stars - Terminal coding agent with hash-anchored edits, LSP integration, subagents, MCP support, and package ecosystem.
  • Mistral-Vibe (Mistral) GitHub stars - Minimal CLI coding agent by Mistral. Lightweight, fast, and designed for local development workflows.
  • Nanocoder (Nano-Collective) GitHub stars - Beautiful local-first coding agent running in your terminal. Built for privacy and control with support for multiple AI providers via OpenRouter.
  • Gemini CLI (Google) GitHub stars - Open-source AI agent that brings Gemini's power directly into your terminal. Supports code generation, shell execution, and file editing with full Apache 2.0 licensing.

Domain-Specific Agents

  • Langflow GitHub stars - Visual low-code platform for agentic workflows.
  • Dify GitHub stars - Production-ready agentic workflow platform.
  • OWL (camel-ai/owl) GitHub stars - Advanced multi-agent collaboration system.
  • SuperAGI GitHub stars - Dev-first autonomous AI agent platform.
  • AI-Scientist-v2 (SakanaAI) GitHub stars - Workshop-level automated scientific discovery via agentic tree search. Generates novel research ideas, runs experiments, and writes papers.
  • PraisonAI GitHub stars - 24/7 AI employee team for automating complex challenges. Low-code multi-agent framework with handoffs, guardrails, memory, RAG, and 100+ LLM providers.
  • Agent-S (Simular AI) GitHub stars - Open agentic framework that uses computers like a human. SOTA on OSWorld benchmark (72.6%) for GUI automation and computer control.

Agent Memory & State

  • Letta (ex-MemGPT) GitHub stars - Platform for building stateful agents with advanced memory that learn and self-improve over time.
  • Mem0 GitHub stars - Universal memory layer for AI agents. Persistent, multi-session memory across models and environments.
  • Forgetful GitHub stars - MCP server for persistent AI agent memory. Stores atomic single-concept notes and auto-links them into a knowledge graph via semantic similarity. SQLite or PostgreSQL.
  • Hindsight GitHub stars - State-of-the-art long-term memory for AI agents by Vectorize. Fully self-hosted, MIT-licensed, with integrations for LangChain, CrewAI, LlamaIndex, Vercel AI SDK, and more.

🔍 5. Retrieval-Augmented Generation (RAG) & Knowledge

Retrieval systems, vector databases, embedding models, and related tooling for RAG pipelines.

Vector Databases & Search Engines

  • Chroma GitHub stars - Most popular open-source embedding database.
  • Qdrant GitHub stars - High-performance vector search engine in Rust.
  • Weaviate GitHub stars - GraphQL-native vector search engine.
  • Milvus GitHub stars - Scalable cloud-native vector database.
  • Faiss GitHub stars - Similarity search and clustering library for dense vectors with CPU and GPU implementations.
  • NornicDB GitHub stars - Golang Low-latency graph + vector hybrid retrieval, Neo4j and qDrant driver compatible.
  • LanceDB GitHub stars - Serverless vector DB optimized for multimodal data.
  • pgvector GitHub stars - PostgreSQL extension for vector similarity search.

Embedding Models

RAG Frameworks & Advanced Retrieval Tools

  • LlamaIndex GitHub stars - Full-featured RAG pipeline with advanced indexing.
  • Haystack GitHub stars - End-to-end NLP and RAG framework.
  • RAGFlow GitHub stars - Deep-document-understanding RAG engine.
  • GraphRAG (Microsoft) GitHub stars - Knowledge-graph-based RAG.
  • Verba (Weaviate) GitHub stars - Golden RAG frontend with intuitive UI for retrieval and exploration.
  • RAGatouille GitHub stars - Advanced retrieval tools with late interaction models (ColBERT).
  • Docling GitHub stars - Document processing toolkit for turning PDFs and other files into structured data for GenAI workflows.
  • Unstructured GitHub stars - Best-in-class document preprocessing.
  • ColPali / ColQwen GitHub stars - Vision-language models for document retrieval.
  • LightRAG GitHub stars - Graph-based RAG with dual-level retrieval system. Simple and fast with comprehensive knowledge discovery (EMNLP 2025).
  • RAG-Anything GitHub stars - All-in-One Multimodal RAG system for seamless processing of text, images, tables, and equations. Built on LightRAG.
  • txtai GitHub stars - All-in-one AI framework for semantic search, LLM orchestration and language model workflows. Embeddings database with customizable pipelines.
  • Infinity GitHub stars - High-throughput, low-latency serving engine for text-embeddings, reranking, CLIP, and ColPali. OpenAI-compatible API.

Web Data Ingestion

  • Crawl4AI GitHub stars - LLM-friendly web crawler that turns websites into clean Markdown for RAG and agentic workflows.
  • Lightpanda GitHub stars - Machine-first headless browser in Zig; rendering-free and ultra-lightweight for AI agent browsing.
  • Paperless-AI GitHub stars - Automated document analyzer for Paperless-ngx with RAG-powered semantic search across your document archive.

🎨 6. Generative Media Tools

Open-source models and applications for image, video, audio, and 3D generation and editing.

Image Generation & Editing

Video Generation

Audio / Music / Voice Generation

  • AudioCraft / MusicGen (Meta) GitHub stars - Controllable text-to-music and audio models.
  • ACE-Step 1.5 GitHub stars - Local-first music generation model with broad hardware support across Mac, AMD, Intel, and CUDA devices.
  • Fish Speech GitHub stars - Zero-shot TTS and voice cloning.
  • CosyVoice 2 GitHub stars - Natural multilingual TTS with emotional control.
  • StyleTTS 2 GitHub stars - Expressive zero-shot TTS.
  • OuteTTS GitHub stars - High-quality open TTS.
  • RVC (Retrieval-based Voice Conversion) GitHub stars - Gold standard for real-time voice cloning.
  • Amphion GitHub stars - Comprehensive toolkit for Audio, Music, and Speech Generation (9.7K stars).
  • YuE GitHub stars - Open full-song generation model producing high-quality music with vocals (similar capabilities to Suno.ai but open-source).
  • OpenVoice GitHub stars - Instant voice cloning by MIT and MyShell with accurate tone color cloning and style control.

3D & Creative Tools


🛠️ 7. Training & Fine-tuning Ecosystem

Tools for model training, fine-tuning, synthetic data generation, and distributed training.

Full Training Frameworks

  • LLaMA-Factory GitHub stars - One-stop unified framework for SFT, DPO, ORPO, KTO with web UI.
  • Axolotl GitHub stars - YAML-driven full pipeline for SFT, DPO, GRPO.
  • Unsloth GitHub stars - 2× faster, 70% less memory fine-tuning.
  • LitGPT GitHub stars - Clean from-scratch implementations of 20+ LLMs.
  • torchtune GitHub stars - PyTorch-native library for post-training, fine-tuning, and experimentation with LLMs.
  • TRL (Transformers Reinforcement Learning) GitHub stars - Official library for RLHF, SFT, DPO, ORPO.

LoRA / PEFT Tools

Synthetic Data Generation

  • distilabel GitHub stars - End-to-end pipeline for synthetic instruction data.
  • Data-Juicer GitHub stars - High-performance data processing for LLM training.
  • Argilla GitHub stars - Open-source data labeling + synthetic data platform.
  • SDV (Synthetic Data Vault) GitHub stars - High-fidelity tabular and relational synthetic data.

Distributed Training

  • DeepSpeed GitHub stars - Extreme-scale training optimizations.
  • Colossal-AI GitHub stars - Unified system for 100B+ models.
  • Megatron-LM GitHub stars - Distributed training framework and reference codebase for large transformer models at scale.
  • Ray Train GitHub stars - Scalable distributed training.

📊 8. MLOps / LLMOps & Production

Tooling for tracking, deploying, monitoring, and operating AI systems in production.

Experiment Tracking & Versioning

  • MLflow GitHub stars - End-to-end open platform for the ML/LLM lifecycle.
  • DVC (Data Version Control) GitHub stars - Git-like versioning for data and models.
  • ClearML GitHub stars - Open-source platform for experiment tracking, orchestration, data management, and model serving.
  • Weights & Biases Weave GitHub stars - Open-source tracing and experiment tracking.

Deployment & Orchestration

  • BentoML GitHub stars - Unified framework to build, ship, and scale AI apps.
  • Ray Serve GitHub stars - Scalable model serving library.
  • ZenML GitHub stars - Pipeline and orchestration framework for taking ML and LLM systems from development to production.
  • Kubeflow GitHub stars - Kubernetes-native ML/LLM platform.
  • KServe GitHub stars - Kubernetes-based model serving.

Monitoring, Evaluation & Observability

  • Langfuse GitHub stars - #1 open-source LLM observability platform.
  • Phoenix (Arize) GitHub stars - AI observability & evaluation platform.
  • Evidently GitHub stars - ML & LLM monitoring framework.
  • Opik (Comet) GitHub stars - Production-ready LLM evaluation platform.
  • LiteLLM GitHub stars - AI Gateway to call 100+ LLM APIs in OpenAI format with unified cost tracking, guardrails, load balancing, and logging.
  • OpenLIT GitHub stars - OpenTelemetry-native LLM observability platform with GPU monitoring, evaluations, prompt management, and guardrails.
  • OpenLLMetry (Traceloop) GitHub stars - Open-source observability for GenAI/LLM applications based on OpenTelemetry with 25+ integration backends.
  • Agenta GitHub stars - Open-source LLMOps platform combining prompt playground, prompt management, LLM evaluation, and observability.
  • Helicone GitHub stars - Open-source LLM observability with request logging, caching, rate limiting, and cost analytics.

Guardrails & Safety Tools


📈 9. Evaluation, Benchmarks & Datasets

Benchmarks, evaluation frameworks, datasets, and supporting tools for model assessment.

Benchmark Suites

  • lm-evaluation-harness (EleutherAI) GitHub stars - De-facto standard for generative model evaluation.
  • HELM (Stanford) GitHub stars - Holistic Evaluation of Language Models.
  • GAIA - Real-world multi-step agentic benchmark.
  • LiveCodeBench GitHub stars - Contamination-free coding benchmark.
  • MMLU-Pro / GPQA GitHub stars - Hardened expert-level benchmarks.
  • OpenCompass GitHub stars - Evaluation platform for benchmarking language and multimodal models across large benchmark suites.
  • SWE-rebench (Nebius) - Continuously updated benchmark with 21,000+ real-world SWE tasks for evaluating agentic LLMs. Decontaminated, mined from GitHub.

Evaluation Frameworks

  • DeepEval GitHub stars - The "Pytest for LLMs".
  • RAGAs GitHub stars - End-to-end RAG evaluation framework.
  • Lighteval GitHub stars - Evaluation toolkit for LLMs across multiple backends with reusable tasks, metrics, and result tracking.
  • Hugging Face Evaluate GitHub stars - Standardized evaluation metrics.

High-quality Open Datasets & Data Tools


🛡️ 10. AI Safety, Alignment & Interpretability

Tools for alignment, interpretability, safety evaluation, and adversarial testing.

Alignment & RLHF Tools

  • Safe-RLHF GitHub stars - Safe reinforcement learning from human feedback.
  • Alignment Handbook GitHub stars - Complete recipes for full-stack alignment.
  • OpenRLHF GitHub stars - High-performance distributed RLHF framework.

Interpretability & Explainability

  • TransformerLens GitHub stars - Gold-standard for mechanistic interpretability.
  • nnsight GitHub stars - Scalable library for intervening on neural networks.
  • SAELens GitHub stars - Sparse autoencoders for interpretable features.
  • Captum GitHub stars - PyTorch's official interpretability library.

Adversarial & Red-teaming Tools


🧩 11. Specialized Domains

Computer Vision

  • OpenCV GitHub stars - World's most widely used computer vision library.
  • Ultralytics YOLO GitHub stars - State-of-the-art real-time object detection.
  • Detectron2 GitHub stars - High-performance object detection library.
  • SAM 2 GitHub stars - Promptable image and video segmentation model with released checkpoints and training code.
  • Kornia GitHub stars - Differentiable computer vision library.
  • MediaPipe GitHub stars - Cross-platform multimodal pipelines.

Reinforcement Learning & Robotics

Time Series & Scientific AI

  • Time Series Library (TSLib) GitHub stars - Comprehensive benchmark for time-series models.
  • Chronos (Amazon) GitHub stars - Pretrained foundation models for time-series forecasting.
  • Darts GitHub stars - Easy-to-use time-series forecasting library.
  • AutoTS GitHub stars - Automated time series forecasting with broad model selection, ensembling, anomaly detection, and holiday effects. Designed for production deployment with minimal setup.

Edge / On-device AI


🖥️ 12. User Interfaces & Self-hosted Platforms

Local AI Chat UIs & Personal Assistants

  • OpenClaw GitHub stars - Local-first personal AI assistant with multi-channel integrations and full agentic task execution.
  • Open WebUI GitHub stars - Most popular self-hosted ChatGPT-style interface.
  • text-generation-webui GitHub stars - Web UI for running local LLMs with multiple backends, extensions, and model formats.
  • LobeChat GitHub stars - Sleek modern chat UI.
  • LibreChat GitHub stars - Feature-packed multi-LLM interface.
  • HuggingChat (self-hosted) GitHub stars - Official open-source codebase for HuggingChat.
  • Khoj GitHub stars - Self-hostable personal AI assistant for search, chat, automation, and workflows over local and web data.
  • Newelle GitHub stars - GNOME/Linux desktop virtual assistant with integrated file editor, global hotkeys, and profile manager.

Full Self-hosted AI Platforms

  • AnythingLLM GitHub stars - All-in-one RAG + agents platform.
  • Dify GitHub stars - Complete AI application platform with visual builder.
  • Langflow GitHub stars - Visual low-code platform for LangChain flows.
  • Flowise GitHub stars - Drag-and-drop LLM app builder.

Desktop & Mobile AI Apps

  • GPT4All GitHub stars - Privacy-first local desktop chatbot.
  • Jan GitHub stars - Local-first AI app framework.
  • SillyTavern GitHub stars - Highly customizable role-playing frontend.

🧪 13. Developer Tools & Integrations

AI Coding Assistants (open-source)

  • Continue GitHub stars - Open-source AI coding autopilot for VS Code & JetBrains.
  • Tabby GitHub stars - Self-hosted AI coding assistant.
  • Cline GitHub stars - Open-source IDE coding agent that can edit files, run commands, and use tools with user approval.
  • Open Interpreter GitHub stars - Lets LLMs run code locally.
  • Roo Code GitHub stars - Open-source editor-based coding agent with multiple modes and tool integrations.
  • Aider GitHub stars - Terminal-based AI pair programmer.

IDE Plugins & Extensions

  • llama.vim GitHub stars - Local LLM-powered code completion plugin for Vim/Neovim using llama.cpp. Fast, privacy-first, no API key needed.
  • CodeCompanion.nvim GitHub stars - AI-powered coding assistant for Neovim. Inline code generation, chat, actions, and tool use with support for multiple LLM providers.
  • Continue VS Code / JetBrains GitHub stars - Most installed open-source AI extension.
  • Jupyter AI GitHub stars - Chat and code generation inside notebooks.

UI Components & Chat Libraries

  • Assistant UI GitHub stars - React/TypeScript library for building production-grade AI chat interfaces. Drop-in components for streaming messages, tool calls, and multi-modal inputs.

Testing & Debugging Tools


📚 14. Resources & Learning

Papers with Open Implementations

Communities, Forums & Newsletters

Courses & Interactive Playgrounds

Starter Projects & Examples


Contributing

Contributions are highly welcome! Please read the CONTRIBUTING.md for guidelines (quality standards, formatting, license requirements, etc.).

  • Only OSI-approved licenses
  • Projects must be actively maintained (commits in last 6 months)
  • High-quality, well-documented, real adoption

License

This list itself is licensed under CC0 1.0 Universal. Feel free to use it for any purpose.


Made with ❤️ for the open-source AI community. Star the repo if you find it useful - it helps more people discover the best open tools!


About

Curated list of the best truly open-source AI projects, models, tools, and infrastructure.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors