Skip to content
#

chatml

Here are 16 public repositories matching this topic...

A dataset toolbox for preparing and analyzing conversational datasets, including CSV splitting, CSV → Parquet conversion, dataset statistics, Parquet cleaning and sorting, HuggingFace–style metadata generation, and batched chain insertion into PostgreSQL — with Rich progress, multiprocessing, and 32 GB-RAM-friendly batching.

  • Updated Oct 2, 2025
  • Python

Deepseek-Dataset-Generator creates conversational datasets for LLM fine-tuning via DeepSeek API. Supports various formats (ChatML, ShareGPT, Alpaca, JSON, CSV), easy configuration via YAML and detailed logs. Ideal for generating realistic and customized data quickly.

  • Updated Jun 2, 2025
  • Python

Week 5 project: build a hybrid retriever that fuses FAISS dense vectors with SQLite FTS5/BM25 keyword search (RRF/weighted fusion), plus a Supervised Fine-Tuning (SFT) pipeline (Full FT vs LoRA/QLoRA) using TRL/PEFT/DeepSpeed.

  • Updated Oct 8, 2025
  • Python

Improve this page

Add a description, image, and links to the chatml topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the chatml topic, visit your repo's landing page and select "manage topics."

Learn more