semantic-chunking

Star

Here are 17 public repositories matching this topic...

isaacus-dev / semchunk

Star

A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.

python nlp text splitting chunking text-chunking text-splitting semantic-chunking isaacus

Updated Oct 28, 2025
Python

mirth / chonky

Star

Fully neural approach for text chunking

ai ml chunking rag text-splitter llms semantic-chunking

Updated Oct 23, 2025
Python

Unsiloed-AI / Unsiloed-Parser

Star

python ocr openai yolo chunking hacktoberfest pdf-parsing ai-agents document-processing rag llm semantic-chunking

Updated Oct 25, 2025
Python

jparkerweb / semantic-chunking

Star

🍱 semantic-chunking ⇢ semantically create chunks from large document for passing to LLM workflows

vector embeddings chunking text-splitter llm text-chunking text-splitting semantic-chunking

Updated Oct 22, 2025
JavaScript

jparkerweb / llm-distillery

Star

🍶 llm-distillery ⇢ use LLMs to run map-reduce summarization tasks on large documents until a target token size is met.

text-summarization text-processing tokenization text-compression token-management openai-api llm large-language-model semantic-chunking text-distillation ai-text-reduction

Updated Oct 15, 2025
JavaScript

prajwal10001 / semantic-chunker-langchain

Star

Token-aware, LangChain-compatible semantic chunker with PDF, markdown, and layout support

python nlp markdown pdf ai rag langchain semantic-chunking

Updated Jun 28, 2025
Python

ThanhHung2112 / Semantic_chunking

Star

Semantic Chunking is a Python library for segmenting text into meaningful chunks using embeddings from Sentence Transformers.

nlp text vector chunking rag text-split vector-database semantic-chunking

Updated Dec 15, 2024
Python

Advanced semantic text chunking with custom structural markers, whole-text coherence preservation, and flexible token management. Features async processing, LangChain integration, and dynamic drift detection. Ideal for RAG systems, augmented text processing, and domain-specific document analysis.

lang rag test-split langchain semantic-chunking text-spl

Updated Aug 10, 2025
Python

gokhaneraslan / llm-qa-dataset-pipeline

Sponsor

Star

🤖 Automated Q&A Dataset Generation Pipeline powered by LLMs. Multi-stage pipeline that searches, filters, extracts and transforms web content into high-quality question-answer datasets for LLM training. Supports multiple LLM providers (Groq, Mistral, Ollama) and search engines.

nlp machine-learning natural-language-processing web-scraping question-answering dataset-generation content-extraction mistral document-processing qa-dataset groq automated-pipeline llm llama-index trafilatura ollama semantic-chunking crawl4ai ai-training-data

Updated Jun 7, 2025
Python

ProfEngel / OpenTuneWeaver

Sponsor

Star

All in One-Solution for converting documents to finetune LLMs

benchmarking ai lora dataset-generation quantization all-in-one gradio model-deployment finetuning pdf-processing qa-generation personal-ai llm vllm qlora gguf semantic-chunking educational-ai opentuneweaver

Updated Sep 26, 2025
Python

Jayandhan03 / HR-Asst-rag

Star

HR Policy Assistant (RAG-based Chatbot) A conversational AI assistant for employees to query company HR policies. Built with LangChain and Qdrant, it semantically ingests HR documents, retrieves relevant policy information, reranks results with BM25/MMR, and delivers precise LLM-generated responses.Cloud-based vector storage ensure quick responses.

streamlit-webapp dense-retrieval huggingface-spaces langchain hybrid-retrieval qdrant-vector-database semantic-chunking rag-chatbot

Updated Oct 15, 2025
Python

pipewrk / llm-core

Star

Lightweight, composable TypeScript library for semantic chunking, workflow pipelining, and LLM orchestration.

nlp typescript pipeline embeddings openai cosine-similarity chunking data-processing bun llm ollama semantic-chunking

Updated Sep 17, 2025
TypeScript

smart-models / Progressive-Summarizer-RAPTOR

Star

Cutting-edge semantic text processing system that uses hierarchical clustering and advanced language models to automatically organize and summarize large volumes of text.

docker rest-api gpu-acceleration raptor hierarchical-clustering rag llm semantic-chunking ollama-integration progressive-summarization

Updated Oct 3, 2025
Python

Utsav-J / chunking_strategies

Star

``retrieval is all you need`` All in 1 repo for different levels of chunking along with their main logic and reusable code. No API keys used. Highly portable and pluggable

retrieval chunking rag langchain semantic-chunking

Updated Oct 1, 2025
Python

url4irl / vectors-gateway

Star

A Sidecar service for applications that need vector database functionality to augment their LLMs. This service provides embeddings and retrieval capabilities by abstracting embeddings generation (LiteLLM) and vector storage and search (Qdrant).

embeddings vectors sidecar rag qdrant litellm semantic-chunking

Updated Oct 12, 2025
TypeScript

dcirne / rag_fundamentals

Star

Retrieval-Augmented Generation (RAG) Fundamentals and Semantic Chunking

machine-learning artificial-intelligence rag semantic-chunking

Updated Jun 19, 2024
Jupyter Notebook

jsonusuman351 / Langchain_Text_Splitter

Star

An exploration of advanced text splitting strategies in LangChain for RAG, from basic character splitting to state-of-the-art semantic chunking.

python openai rag llm text-splitting semantic-chunking langcahin

Updated Sep 15, 2025
Python

Improve this page

Add a description, image, and links to the semantic-chunking topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the semantic-chunking topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

semantic-chunking

Here are 17 public repositories matching this topic...

isaacus-dev / semchunk

mirth / chonky

Unsiloed-AI / Unsiloed-Parser

jparkerweb / semantic-chunking

jparkerweb / llm-distillery

prajwal10001 / semantic-chunker-langchain

ThanhHung2112 / Semantic_chunking

bazilicum / axonode-chunker

gokhaneraslan / llm-qa-dataset-pipeline

ProfEngel / OpenTuneWeaver

Jayandhan03 / HR-Asst-rag

pipewrk / llm-core

smart-models / Progressive-Summarizer-RAPTOR

Utsav-J / chunking_strategies

url4irl / vectors-gateway

dcirne / rag_fundamentals

jsonusuman351 / Langchain_Text_Splitter

Improve this page

Add this topic to your repo