Awesome Legaltech

The Ultimate Collection of Open-Source Legal Technology & AI Resources

Curating the best production-ready tools, datasets, and communities for legal professionals and developers

What Makes This List Special?

Rigorous Quality Standards → Only production-ready, actively maintained projects with real-world adoption
Global Legal Coverage → Worldwide scope with clear jurisdiction tagging (🇺🇸 🇪🇺 🇬🇧 🇩🇪 🇮🇳 🇮🇱 🌍)
Practitioner-Tested Tools → Real solutions that legal professionals deploy in actual workflows
Premium AI Resources → Curated datasets, benchmarks, and models purpose-built for legal applications
Thriving Ecosystems → Active communities driving innovation and collaborative development

New to Legal AI? Start with our Quick Start Guide below!

Quick Navigation	Count	Best For
NLP Libraries & Domain Models	10 projects	Text processing & analysis
AI-Powered Contract & Document Analytics	5 platforms	Contract intelligence
RAG & AI Infrastructure	8 tools	Building legal AI apps
Agentic AI & Automation	5 tools	AI agents & workflow automation
Legal Research & Case Law Data/APIs	9 resources	Research & citations
E-Discovery & Litigation	6 tools	Legal discovery & annotation
Speech Recognition & Transcription	10 tools	Audio/video transcription
Document Signing & Collaboration	5 platforms	Digital signatures & wikis
Document Management, OCR & PDF	14 solutions	Document processing
Document Assembly & Rules-as-Code	7 platforms	Automation & workflows
Knowledge Management	5 tools	Research notes & PKM
AI Agent Skills for Legal Work	14 skills	Legal AI automation
Datasets & Benchmarks	10 collections	Training & evaluation
General-Purpose Document Intelligence	6 tools	Document understanding
Learning, Communities & Curations	6 communities	Education & networking

Total: 120+ High-Quality Open-Source Legal Tech Resources

Quick Start Guide

For Legal Professionals

Start with: AI-Powered Contract Analytics for document review
Research tools: Legal Research & Case Law APIs for case discovery
Document processing: Document Management & OCR for digitization

For Developers

Begin with: NLP Libraries & Models for text processing
Training data: Datasets & Benchmarks for model development
AI infrastructure: RAG & AI Infrastructure for building pipelines

For Organizations

Enterprise solutions: OpenContracts for contract analytics
Document workflows: docassemble for automation
Case management: CourtListener for legal data

NLP Libraries & Domain Models

Essential tools for processing and understanding legal text with specialized language models

Project	Description	Scope	Stars	License
Hugging Face Transformers	Universal toolkit for fine-tuning and running transformer models on legal text	Global	⭐ 159k	Apache-2.0
spaCy	Industrial-strength NLP in Python — foundation for many legal NLP pipelines	Global	⭐ 33k	MIT
Sentence Transformers	State-of-the-art text embeddings for semantic legal search	Global	⭐ 19k	Apache-2.0
LexNLP	Information extraction from unstructured legal text (Python)	Global	—	Apache-2.0
Blackstone	spaCy pipeline for long-form legal text processing	Global	—	MIT
LEGAL-BERT	Pretrained BERT variants for legal corpora (contracts, ECHR, EU law)	EU/Global	—	—
InLegalBERT 🇮🇳	BERT models and recipes for Indian law corpora	India	—	—
LeXLMs	Corpora and probing tasks for legal language models	Multilingual	—	—
Legal-HeBERT 🇮🇱	BERT model for Hebrew legal and legislative domains	Israel	—	—
CaseHOLD	Tasks and baselines for case-law holdings analysis	Global	—	—

AI-Powered Contract & Document Analytics

Enterprise-grade platforms for intelligent contract analysis and document understanding

Project	Features	Best For
OpenContracts	Enterprise document analytics with AI-powered analysis (GPL-3)	Large organizations
ContraxSuite	Full contract analytics & document platform (AGPL)	Commercial use
LawGlance	Free, open-source RAG-based AI legal assistant	SME & individuals
OpenEDGAR 🇺🇸	Framework for searchable EDGAR filings databases	US Securities
CUAD Tools	Code and data interfaces for Contract Understanding	Research

RAG & AI Infrastructure

Core infrastructure for building retrieval-augmented and AI-powered legal applications

Project	Description	Stars	License
Ollama	Run large language models locally — essential for confidential legal documents	⭐ 168k	MIT
LangChain	The agent engineering platform — build LLM apps and RAG pipelines	⭐ 133k	MIT
Open WebUI	Self-hosted ChatGPT-like interface for local and remote LLMs	⭐ 130k	—
LlamaIndex	Data framework for building LLM applications over your documents	⭐ 48k	MIT
Qdrant	High-performance vector database for semantic search and AI applications	⭐ 30k	Apache-2.0
Chroma	Open-source AI-native vector database for embeddings	⭐ 27k	Apache-2.0
Haystack	Production-grade NLP framework for document search and QA pipelines	⭐ 25k	Apache-2.0
pgvector	Vector similarity search extension for PostgreSQL — zero-infra RAG	⭐ 21k	—

Agentic AI & Automation

Frameworks and platforms for building AI agents and automating legal workflows

Project	Description	Stars	License
n8n	Fair-code workflow automation with 400+ integrations and native AI — ideal for legal ops	⭐ 183k	—
Browser Use	AI agents that control a browser to automate web-based legal research	⭐ 86k	MIT
AutoGen	Microsoft’s multi-agent framework for orchestrating complex AI workflows	⭐ 57k	CC-BY-4.0
CrewAI	Framework for orchestrating role-playing autonomous AI agents	⭐ 48k	MIT
Activepieces	Open-source no-code automation with MCP support and self-hosting	⭐ 22k	—

Legal Research & Case Law Data/APIs

Comprehensive databases and APIs for legal research and case law discovery

Project	Coverage	Jurisdiction
CourtListener 🇺🇸	Primary legal data & research platform	United States
Juriscraper 🇺🇸	Scrapers for opinions, oral arguments, PACER content	United States
Eyecite 🇺🇸	Fast, robust legal citation extractor	United States
Caselaw Access Project 🇺🇸	6.7M+ U.S. court decisions with API	United States
UK National Archives 🇬🇧	Public API for UK court judgments	United Kingdom
Open Legal Data 🇩🇪	German legal data platform & API	Germany
EUR-Lex SPARQL 🇪🇺	Official EU law SPARQL API — access all EU legislation and case law	European Union
Open Knesset 🇮🇱	Open data platform for Israeli Knesset legislative proceedings and members	Israel
OpenAlex	Fully open catalog of 250M+ scholarly works including legal research	Global

E-Discovery & Litigation

Specialized tools for legal discovery, document review, annotation, and litigation support

Project	Capabilities	Use Case	Stars
Label Studio	Multi-type data labeling and annotation tool	Legal document annotation & NLP training data	⭐ 27k
doccano	Open source annotation tool for ML practitioners	NLP training data from legal corpora	⭐ 11k
Apache Tika	Detects and extracts metadata and text from 1000+ file types	eDiscovery content analysis	⭐ 4k
FreeEed	Complete eDiscovery processing (OCR, indexing, metadata)	Large-scale discovery	—
FreeDiscovery	Information retrieval engine based on scikit-learn	Document analysis	—
FOIAMachine 🇺🇸	Manage and send FOIA requests with agency directory	Government transparency	—

Speech Recognition & Transcription

Essential tools for converting audio/video to text in legal workflows

Project	Specialty	Use Case
Whisper	General-purpose speech recognition by OpenAI	Multilingual transcription
WhisperX	Fast ASR with word-level timestamps and speaker diarization	Speaker identification
faster-whisper	Optimized Whisper implementation	Efficient transcription
insanely-fast-whisper	Ultra-fast Whisper implementation	Batch processing
WhisperLiveKit	Real-time speech recognition with Whisper	Live transcription
whisper-diarization	Speaker diarization with Whisper	Multi-speaker identification
Vibe	Desktop transcription app with Whisper	Self-hosted transcription
Scriberr	Transcription and note-taking tool	Meeting transcription
hebrew_whisper 🇮🇱	GUI for Hebrew transcription using ivrit.ai Whisper models	Hebrew legal transcription
ivrit.ai Whisper Turbo 🇮🇱	Optimized Hebrew Whisper model with 388 hours training data	Hebrew speech recognition

Document Signing & Collaboration

Platforms for digital document signing, secure notes, and collaborative documentation

Project	Primary Use	Stars	License
Documenso	Open-source DocuSign alternative	—	AGPL
DocuSeal	Document filling and signing platform	—	AGPL
OpenSign	Free & open-source DocuSign alternative with self-hosting	⭐ 6k	—
Notesnook	Fully open source & E2E-encrypted note-taking — ideal for sensitive legal work	⭐ 14k	GPL-3.0
Docmost	Collaborative wiki and documentation software	—	AGPL

Document Management, OCR & PDF

Essential tools for document digitization, management, and processing workflows

Project	Core Function	Stars	License
markitdown	Convert PDF/DOCX/PPTX and more to Markdown	⭐ 93k	MIT
Tesseract	Industry-standard OCR engine, 100+ languages	⭐ 58k	Apache-2.0
Docling	Modern document parsing — PDF/DOCX/PPTX/HTML	⭐ 57k	MIT
Stirling-PDF	Local web-based PDF toolbox (split/merge/convert/OCR)	⭐ 50k	GPL-3.0
Gotenberg	Developer-friendly API for converting HTML/DOCX/more to PDF	⭐ 12k	MIT
pdfplumber	Extract text, tables, and metadata from PDFs with precision	⭐ 10k	MIT
PyMuPDF	High-performance Python library for PDF extraction, annotation, and rendering	⭐ 9k	AGPL-3.0
WeasyPrint	Convert HTML/CSS to PDF — great for generating court-ready documents	⭐ 9k	BSD-3-Clause
OCRmyPDF	Add searchable OCR text layer to scanned PDFs	—	MPL-2.0
docTR	Deep learning OCR engine with strong accuracy on structured documents	⭐ 6k	Apache-2.0
EasyOCR	Ready-to-use OCR with 80+ languages	—	Apache-2.0
paperless-ngx	Self-hosted document management system with AI tagging	—	GPL-3.0
Paperless-AI	AI addon for paperless-ngx (semantic search, auto-classification)	—	—
ExifTool	Read/write metadata in files — digital evidence analysis	—	GPL

Document Assembly & Rules-as-Code

Platforms for automating legal document creation and implementing legal logic as code

Project	Primary Use	Target Users
docassemble	Expert-system platform for guided interviews	Legal professionals
AssemblyLine 🇺🇸	Court-form automation toolkit	Court systems
python-docx-template	Jinja2-based template engine for generating Word legal documents	Developers
Blawx	Visual Rules-as-Code environment	Legal technologists
Catala	Programming language for faithful statute implementation	Developers
OpenFisca	Open legislation simulation engine — used by governments to model social laws	Govtech / Developers
LEOS 🇪🇺	Legislative editing platform for AkomaNtoso XML format	EU institutions

Knowledge Management

Tools for building personal knowledge bases, research notes, and collaborative workspaces

Project	Description	Stars	License
AFFiNE	Next-gen knowledge base combining docs, whiteboard, and database — privacy-first	⭐ 67k	—
Memos	Open-source, self-hosted note-taking built for quick capture — Markdown-native	⭐ 59k	MIT
SiYuan	Privacy-first, self-hosted personal knowledge management with full encryption	⭐ 42k	AGPL-3.0
Logseq	Privacy-first open-source platform for knowledge management — loved by researchers and lawyers	⭐ 42k	AGPL-3.0
Obsidian	Markdown-based personal knowledge base, widely used by legal professionals	⭐ 16k	—

AI Agent Skills for Legal Work

Open-source skills that teach AI agents (Claude, etc.) to perform specialized legal tasks — from contract review to compliance checks

What are Agent Skills? Skills are instruction sets that AI agents load dynamically to perform specialized tasks. Learn more at lawcal.ai/resources/skills.

Legal-Specific Skills

From anthropics/knowledge-work-plugins — Apache-2.0 licensed

Skill	What It Does
legal-risk-assessment	Severity × likelihood risk matrix with escalation criteria
review-contract	Contract review against negotiation playbook; generates redlines
triage-nda	Rapid NDA triage → GREEN / YELLOW / RED routing
compliance-check	Surfaces applicable regulations and required approvals
compliance-tracking	GDPR/CCPA/DPA review, data subject requests, regulatory monitoring
legal-response	Templated responses to litigation holds, subpoenas, data requests
vendor-check	Consolidated view of vendor agreements + deadline tracking
signature-request	Pre-signature checklist + e-signature routing
brief	Daily legal briefing across email, calendar, and contracts
meeting-briefing	Structured pre-meeting briefing for negotiations/compliance reviews

Document Processing Skills

From anthropics/skills

Skill	What It Does
pdf	Extract text/tables, create, merge/split, and fill PDF forms
docx	Create, read, edit Word documents with formatting and templates
pptx	Create and edit PowerPoint presentations programmatically
xlsx	Create, edit, and analyze spreadsheets with formulas and charts

Datasets & Benchmarks

High-quality training data and evaluation benchmarks for legal AI development

Dataset	Content Type	Coverage	Best For
Pile of Law 🇺🇸	Legal/administrative texts	US-centric	Language model training
MultiLegalPile 🌍	Multilingual legal corpus	24 languages	Multilingual models
LexGLUE 🇪🇺🇺🇸	Multi-task benchmark	EU/US/Multi	Legal NLU evaluation
LEXTREME 🌍	Multilingual legal tasks	24 languages	Cross-lingual evaluation
LegalBench	Legal reasoning tasks	Global	LLM legal reasoning
LegalBench-RAG	Contract retrieval benchmark	Global	RAG system evaluation
CUAD	Contract clause annotations	Global	Contract understanding
CaseHOLD 🇺🇸	Case holdings analysis	United States	Legal reasoning
ivrit.ai datasets 🇮🇱	Hebrew speech dataset creation platform	Israel	Hebrew model training
crowd-transcribe-v5 🇮🇱	Hebrew speech dataset with 388 hours transcribed data	Israel	Hebrew speech models

General-Purpose Document Intelligence (useful for legal)

Not legal-specific, but widely used in legal AI pipelines for document processing

Project	Specialty	Input Types
GROBID	ML extraction of document structure	PDF → TEI/XML
Unstructured	Pre-processing for RAG pipelines	PDF/Office/HTML
Layout Parser	Deep learning layout detection	Multi-format
Nougat	Neural OCR for academic documents	Academic PDFs
Marker	Fast PDF to Markdown conversion	PDF
Docling	Modern document parsing	PDF/DOCX/PPTX/HTML

Learning, Communities & Curations

Essential communities and learning resources for legal AI professionals

Resource	Focus	Link
Free Law Project	Open legal data ecosystem	GitHub Org
Awesome Legal NLP	Curated academic research	GitHub
Legal ML Datasets	Comprehensive legal ML datasets collection	GitHub
Awesome Legal Data	Curated open-source tools for the legal industry	GitHub
Stanford CodeX FutureLaw	Annual legal tech conference from Stanford Law	Website
EOLE Conference 🇪🇺	European Open Source & Free Software Law Event	Website

Contributing

We'd love your help making this list even better! Here's how to contribute:

Submission Guidelines

Must Have:

Open-source with OSI-approved license
Clear documentation and README
Active maintenance (commits within 12 months)
Clear relevance to legal workflows

Nice to Have:

Community adoption (GitHub stars)
Production usage examples
Testing and CI/CD
Performance benchmarks

How to Submit

Fork this repository
Add your project in the appropriate section (alphabetical order)
Include: Name, one-line description, primary link(s), jurisdiction flag if applicable
Test your links and formatting
Submit a pull request with a clear description

Optional Quality Checks

# Link checker
npx lychee --no-progress --accept 200,999 README.md

# Awesome list linter  
npx awesome-lint

Curation Policy

We Include	We Exclude
Open-source projects only	Closed-source SaaS platforms
Global scope (jurisdiction-tagged)	Internal/private tools
Production-ready tools	Abandoned experimental repos
High-value datasets/benchmarks	Low-quality or duplicate data
Active, reputable communities	Inactive or harmful communities

Quality First: We prioritize well-maintained projects with good documentation and real-world usage over comprehensive coverage.

License

CC0 1.0 Universal – No rights reserved.

Feel free to copy, remix, and build upon this list.

By contributing, you agree to license your contribution under CC0.

Credits

Curated by Chen Friedman
Powered by Lawcal AI

Star this repo if you found it helpful!

Made with ❤️ for the legal tech community

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
LICENSE		LICENSE
README.md		README.md
README_HE.md		README_HE.md

Folders and files

Latest commit

History

Repository files navigation

Awesome Legaltech

The Ultimate Collection of Open-Source Legal Technology & AI Resources

What Makes This List Special?

Table of Contents

Quick Start Guide

For Legal Professionals

For Developers

For Organizations

NLP Libraries & Domain Models

AI-Powered Contract & Document Analytics

RAG & AI Infrastructure

Agentic AI & Automation

Legal Research & Case Law Data/APIs

E-Discovery & Litigation

Speech Recognition & Transcription

Document Signing & Collaboration

Document Management, OCR & PDF

Document Assembly & Rules-as-Code

Knowledge Management

AI Agent Skills for Legal Work

Legal-Specific Skills

Document Processing Skills

Datasets & Benchmarks

General-Purpose Document Intelligence (useful for legal)

Learning, Communities & Curations

Contributing

Submission Guidelines

How to Submit

Optional Quality Checks

Curation Policy

License

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages