Skip to content

chen-friedman/awesome-legaltech

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 

Repository files navigation

Awesome Legaltech Awesome GitHub Stars PRs Welcome

The Ultimate Collection of Open-Source Legal Technology & AI Resources

Curating the best production-ready tools, datasets, and communities for legal professionals and developers

Legal Tech AI Powered Global Scope

🇬🇧 English | 🇮🇱 עברית


What Makes This List Special?

Rigorous Quality Standards → Only production-ready, actively maintained projects with real-world adoption
Global Legal Coverage → Worldwide scope with clear jurisdiction tagging (🇺🇸 🇪🇺 🇬🇧 🇩🇪 🇮🇳 🇮🇱 🌍)
Practitioner-Tested Tools → Real solutions that legal professionals deploy in actual workflows
Premium AI Resources → Curated datasets, benchmarks, and models purpose-built for legal applications
Thriving Ecosystems → Active communities driving innovation and collaborative development

New to Legal AI? Start with our Quick Start Guide below!


Table of Contents

Quick Navigation Count Best For
NLP Libraries & Domain Models 10 projects Text processing & analysis
AI-Powered Contract & Document Analytics 5 platforms Contract intelligence
RAG & AI Infrastructure 8 tools Building legal AI apps
Agentic AI & Automation 5 tools AI agents & workflow automation
Legal Research & Case Law Data/APIs 9 resources Research & citations
E-Discovery & Litigation 6 tools Legal discovery & annotation
Speech Recognition & Transcription 10 tools Audio/video transcription
Document Signing & Collaboration 5 platforms Digital signatures & wikis
Document Management, OCR & PDF 14 solutions Document processing
Document Assembly & Rules-as-Code 7 platforms Automation & workflows
Knowledge Management 5 tools Research notes & PKM
AI Agent Skills for Legal Work 14 skills Legal AI automation
Datasets & Benchmarks 10 collections Training & evaluation
General-Purpose Document Intelligence 6 tools Document understanding
Learning, Communities & Curations 6 communities Education & networking

Total: 120+ High-Quality Open-Source Legal Tech Resources


Quick Start Guide

For Legal Professionals

  1. Start with: AI-Powered Contract Analytics for document review
  2. Research tools: Legal Research & Case Law APIs for case discovery
  3. Document processing: Document Management & OCR for digitization

For Developers

  1. Begin with: NLP Libraries & Models for text processing
  2. Training data: Datasets & Benchmarks for model development
  3. AI infrastructure: RAG & AI Infrastructure for building pipelines

For Organizations

  1. Enterprise solutions: OpenContracts for contract analytics
  2. Document workflows: docassemble for automation
  3. Case management: CourtListener for legal data

NLP Libraries & Domain Models

Essential tools for processing and understanding legal text with specialized language models

Project Description Scope Stars License
Hugging Face Transformers Universal toolkit for fine-tuning and running transformer models on legal text Global ⭐ 159k Apache-2.0
spaCy Industrial-strength NLP in Python — foundation for many legal NLP pipelines Global ⭐ 33k MIT
Sentence Transformers State-of-the-art text embeddings for semantic legal search Global ⭐ 19k Apache-2.0
LexNLP Information extraction from unstructured legal text (Python) Global Apache-2.0
Blackstone spaCy pipeline for long-form legal text processing Global MIT
LEGAL-BERT Pretrained BERT variants for legal corpora (contracts, ECHR, EU law) EU/Global
InLegalBERT 🇮🇳 BERT models and recipes for Indian law corpora India
LeXLMs Corpora and probing tasks for legal language models Multilingual
Legal-HeBERT 🇮🇱 BERT model for Hebrew legal and legislative domains Israel
CaseHOLD Tasks and baselines for case-law holdings analysis Global

AI-Powered Contract & Document Analytics

Enterprise-grade platforms for intelligent contract analysis and document understanding

Project Features Best For Maturity
OpenContracts Enterprise document analytics with AI-powered analysis (GPL-3) Large organizations Enterprise
ContraxSuite Full contract analytics & document platform (AGPL) Commercial use Production
LawGlance Free, open-source RAG-based AI legal assistant SME & individuals Community
OpenEDGAR 🇺🇸 Framework for searchable EDGAR filings databases US Securities Stable
CUAD Tools Code and data interfaces for Contract Understanding Research Research

RAG & AI Infrastructure

Core infrastructure for building retrieval-augmented and AI-powered legal applications

Project Description Stars License
Ollama Run large language models locally — essential for confidential legal documents ⭐ 168k MIT
LangChain The agent engineering platform — build LLM apps and RAG pipelines ⭐ 133k MIT
Open WebUI Self-hosted ChatGPT-like interface for local and remote LLMs ⭐ 130k
LlamaIndex Data framework for building LLM applications over your documents ⭐ 48k MIT
Qdrant High-performance vector database for semantic search and AI applications ⭐ 30k Apache-2.0
Chroma Open-source AI-native vector database for embeddings ⭐ 27k Apache-2.0
Haystack Production-grade NLP framework for document search and QA pipelines ⭐ 25k Apache-2.0
pgvector Vector similarity search extension for PostgreSQL — zero-infra RAG ⭐ 21k

Agentic AI & Automation

Frameworks and platforms for building AI agents and automating legal workflows

Project Description Stars License
n8n Fair-code workflow automation with 400+ integrations and native AI — ideal for legal ops ⭐ 183k
Browser Use AI agents that control a browser to automate web-based legal research ⭐ 86k MIT
AutoGen Microsoft’s multi-agent framework for orchestrating complex AI workflows ⭐ 57k CC-BY-4.0
CrewAI Framework for orchestrating role-playing autonomous AI agents ⭐ 48k MIT
Activepieces Open-source no-code automation with MCP support and self-hosting ⭐ 22k

Legal Research & Case Law Data/APIs

Comprehensive databases and APIs for legal research and case law discovery

Project Coverage Jurisdiction API Access
CourtListener 🇺🇸 Primary legal data & research platform United States API
Juriscraper 🇺🇸 Scrapers for opinions, oral arguments, PACER content United States Tools
Eyecite 🇺🇸 Fast, robust legal citation extractor United States Library
Caselaw Access Project 🇺🇸 6.7M+ U.S. court decisions with API United States API
UK National Archives 🇬🇧 Public API for UK court judgments United Kingdom API
Open Legal Data 🇩🇪 German legal data platform & API Germany Platform
EUR-Lex SPARQL 🇪🇺 Official EU law SPARQL API — access all EU legislation and case law European Union API
Open Knesset 🇮🇱 Open data platform for Israeli Knesset legislative proceedings and members Israel Platform
OpenAlex Fully open catalog of 250M+ scholarly works including legal research Global API

E-Discovery & Litigation

Specialized tools for legal discovery, document review, annotation, and litigation support

Project Capabilities Use Case Stars
Label Studio Multi-type data labeling and annotation tool Legal document annotation & NLP training data ⭐ 27k
doccano Open source annotation tool for ML practitioners NLP training data from legal corpora ⭐ 11k
Apache Tika Detects and extracts metadata and text from 1000+ file types eDiscovery content analysis ⭐ 4k
FreeEed Complete eDiscovery processing (OCR, indexing, metadata) Large-scale discovery
FreeDiscovery Information retrieval engine based on scikit-learn Document analysis
FOIAMachine 🇺🇸 Manage and send FOIA requests with agency directory Government transparency

Speech Recognition & Transcription

Essential tools for converting audio/video to text in legal workflows

Project Specialty Performance Use Case
Whisper General-purpose speech recognition by OpenAI High Multilingual transcription
WhisperX Fast ASR with word-level timestamps and speaker diarization Ultra Fast Speaker identification
faster-whisper Optimized Whisper implementation Fast Efficient transcription
insanely-fast-whisper Ultra-fast Whisper implementation Insane Batch processing
WhisperLiveKit Real-time speech recognition with Whisper Real-time Live transcription
whisper-diarization Speaker diarization with Whisper Specialized Multi-speaker identification
Vibe Desktop transcription app with Whisper Desktop Self-hosted transcription
Scriberr Transcription and note-taking tool Notes Meeting transcription
hebrew_whisper 🇮🇱 GUI for Hebrew transcription using ivrit.ai Whisper models Hebrew Hebrew legal transcription
ivrit.ai Whisper Turbo 🇮🇱 Optimized Hebrew Whisper model with 388 hours training data Optimized Hebrew speech recognition

Document Signing & Collaboration

Platforms for digital document signing, secure notes, and collaborative documentation

Project Primary Use Stars License
Documenso Open-source DocuSign alternative AGPL
DocuSeal Document filling and signing platform AGPL
OpenSign Free & open-source DocuSign alternative with self-hosting ⭐ 6k
Notesnook Fully open source & E2E-encrypted note-taking — ideal for sensitive legal work ⭐ 14k GPL-3.0
Docmost Collaborative wiki and documentation software AGPL

Document Management, OCR & PDF

Essential tools for document digitization, management, and processing workflows

Project Core Function Stars License
markitdown Convert PDF/DOCX/PPTX and more to Markdown ⭐ 93k MIT
Tesseract Industry-standard OCR engine, 100+ languages ⭐ 58k Apache-2.0
Docling Modern document parsing — PDF/DOCX/PPTX/HTML ⭐ 57k MIT
Stirling-PDF Local web-based PDF toolbox (split/merge/convert/OCR) ⭐ 50k GPL-3.0
Gotenberg Developer-friendly API for converting HTML/DOCX/more to PDF ⭐ 12k MIT
pdfplumber Extract text, tables, and metadata from PDFs with precision ⭐ 10k MIT
PyMuPDF High-performance Python library for PDF extraction, annotation, and rendering ⭐ 9k AGPL-3.0
WeasyPrint Convert HTML/CSS to PDF — great for generating court-ready documents ⭐ 9k BSD-3-Clause
OCRmyPDF Add searchable OCR text layer to scanned PDFs MPL-2.0
docTR Deep learning OCR engine with strong accuracy on structured documents ⭐ 6k Apache-2.0
EasyOCR Ready-to-use OCR with 80+ languages Apache-2.0
paperless-ngx Self-hosted document management system with AI tagging GPL-3.0
Paperless-AI AI addon for paperless-ngx (semantic search, auto-classification)
ExifTool Read/write metadata in files — digital evidence analysis GPL

Document Assembly & Rules-as-Code

Platforms for automating legal document creation and implementing legal logic as code

Project Primary Use Target Users Technical Level
docassemble Expert-system platform for guided interviews Legal professionals Medium
AssemblyLine 🇺🇸 Court-form automation toolkit Court systems Low
python-docx-template Jinja2-based template engine for generating Word legal documents Developers Low
Blawx Visual Rules-as-Code environment Legal technologists High
Catala Programming language for faithful statute implementation Developers High
OpenFisca Open legislation simulation engine — used by governments to model social laws Govtech / Developers High
LEOS 🇪🇺 Legislative editing platform for AkomaNtoso XML format EU institutions Enterprise

Knowledge Management

Tools for building personal knowledge bases, research notes, and collaborative workspaces

Project Description Stars License
AFFiNE Next-gen knowledge base combining docs, whiteboard, and database — privacy-first ⭐ 67k
Memos Open-source, self-hosted note-taking built for quick capture — Markdown-native ⭐ 59k MIT
SiYuan Privacy-first, self-hosted personal knowledge management with full encryption ⭐ 42k AGPL-3.0
Logseq Privacy-first open-source platform for knowledge management — loved by researchers and lawyers ⭐ 42k AGPL-3.0
Obsidian Markdown-based personal knowledge base, widely used by legal professionals ⭐ 16k

AI Agent Skills for Legal Work

Open-source skills that teach AI agents (Claude, etc.) to perform specialized legal tasks — from contract review to compliance checks

What are Agent Skills? Skills are instruction sets that AI agents load dynamically to perform specialized tasks. Learn more at lawcal.ai/resources/skills.

Legal-Specific Skills

From anthropics/knowledge-work-plugins — Apache-2.0 licensed

Skill What It Does
legal-risk-assessment Severity × likelihood risk matrix with escalation criteria
review-contract Contract review against negotiation playbook; generates redlines
triage-nda Rapid NDA triage → GREEN / YELLOW / RED routing
compliance-check Surfaces applicable regulations and required approvals
compliance-tracking GDPR/CCPA/DPA review, data subject requests, regulatory monitoring
legal-response Templated responses to litigation holds, subpoenas, data requests
vendor-check Consolidated view of vendor agreements + deadline tracking
signature-request Pre-signature checklist + e-signature routing
brief Daily legal briefing across email, calendar, and contracts
meeting-briefing Structured pre-meeting briefing for negotiations/compliance reviews

Document Processing Skills

From anthropics/skills

Skill What It Does
pdf Extract text/tables, create, merge/split, and fill PDF forms
docx Create, read, edit Word documents with formatting and templates
pptx Create and edit PowerPoint presentations programmatically
xlsx Create, edit, and analyze spreadsheets with formulas and charts

Datasets & Benchmarks

High-quality training data and evaluation benchmarks for legal AI development

Dataset Content Type Coverage Scale Best For
Pile of Law 🇺🇸 Legal/administrative texts US-centric Large Language model training
MultiLegalPile 🌍 Multilingual legal corpus 24 languages Massive Multilingual models
LexGLUE 🇪🇺🇺🇸 Multi-task benchmark EU/US/Multi Medium Legal NLU evaluation
LEXTREME 🌍 Multilingual legal tasks 24 languages Large Cross-lingual evaluation
LegalBench Legal reasoning tasks Global Comprehensive LLM legal reasoning
LegalBench-RAG Contract retrieval benchmark Global Focused RAG system evaluation
CUAD Contract clause annotations Global Specialized Contract understanding
CaseHOLD 🇺🇸 Case holdings analysis United States Targeted Legal reasoning
ivrit.ai datasets 🇮🇱 Hebrew speech dataset creation platform Israel Platform Hebrew model training
crowd-transcribe-v5 🇮🇱 Hebrew speech dataset with 388 hours transcribed data Israel Large Hebrew speech models

General-Purpose Document Intelligence (useful for legal)

Not legal-specific, but widely used in legal AI pipelines for document processing

Project Specialty Input Types Performance
GROBID ML extraction of document structure PDF → TEI/XML High
Unstructured Pre-processing for RAG pipelines PDF/Office/HTML Versatile
Layout Parser Deep learning layout detection Multi-format Advanced
Nougat Neural OCR for academic documents Academic PDFs Specialized
Marker Fast PDF to Markdown conversion PDF Fast
Docling Modern document parsing PDF/DOCX/PPTX/HTML Modern

Learning, Communities & Curations

Essential communities and learning resources for legal AI professionals

Resource Focus Link Activity Level
Free Law Project Open legal data ecosystem GitHub Org Very Active
Awesome Legal NLP Curated academic research GitHub Active
Legal ML Datasets Comprehensive legal ML datasets collection GitHub Active
Awesome Legal Data Curated open-source tools for the legal industry GitHub Active
Stanford CodeX FutureLaw Annual legal tech conference from Stanford Law Website Annual
EOLE Conference 🇪🇺 European Open Source & Free Software Law Event Website Annual

Contributing

We'd love your help making this list even better! Here's how to contribute:

Submission Guidelines

Must Have:

  • Open-source with OSI-approved license
  • Clear documentation and README
  • Active maintenance (commits within 12 months)
  • Clear relevance to legal workflows

Nice to Have:

  • Community adoption (GitHub stars)
  • Production usage examples
  • Testing and CI/CD
  • Performance benchmarks

How to Submit

  1. Fork this repository
  2. Add your project in the appropriate section (alphabetical order)
  3. Include: Name, one-line description, primary link(s), jurisdiction flag if applicable
  4. Test your links and formatting
  5. Submit a pull request with a clear description

Optional Quality Checks

# Link checker
npx lychee --no-progress --accept 200,999 README.md

# Awesome list linter  
npx awesome-lint

Curation Policy

We Include We Exclude
Open-source projects only Closed-source SaaS platforms
Global scope (jurisdiction-tagged) Internal/private tools
Production-ready tools Abandoned experimental repos
High-value datasets/benchmarks Low-quality or duplicate data
Active, reputable communities Inactive or harmful communities

Quality First: We prioritize well-maintained projects with good documentation and real-world usage over comprehensive coverage.


License

CC0

CC0 1.0 Universal – No rights reserved.

Feel free to copy, remix, and build upon this list.

By contributing, you agree to license your contribution under CC0.


Credits

Curated by Chen Friedman
Powered by Lawcal AI


Star this repo if you found it helpful!

GitHub Stars

Made with ❤️ for the legal tech community

About

Curated open-source Legal AI & LegalTech — tools, datasets, benchmarks, and learning resources. Global scope; jurisdiction-tagged.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors