Discord-Micae Model Chat Interface

A Python-based interactive CLI interface for chatting with Hugging Face language models, optimized for casual, Discord-style conversation using ChatML. Supports both quantized and full-precision models, live token streaming with color formatting, and dynamic generation parameter adjustment.

Features

Multiple Model Formats
- Hugging Face Transformers (AutoModelForCausalLM)
- GGUF (llama.cpp) backend
- LoRA adapter loading
- 4-bit / 8-bit quantization with bitsandbytes
Custom Prompt Controls
- Chain-of-Thought context management
- Raw blank mode, no system prompts, or assistant-only modes
- DeepHermes and ChatML formatting options
- Optional code detection and filtering
Interactive Chat
- Multi-line input with prompt_toolkit
- Persistent conversation history (/back, /clear)
- Runtime parameter adjustment (/min, /max, /temp, /p, /k, /r, /rh)
Streaming Output
- Token-by-token display with Rich coloring
- Emoji filtering and cleanup
- Automatic lowercasing rules
- EOS-Aware Extension: starts with a short randomized budget (40–75 tokens), then automatically extends generation in steps (64 tokens) until <|im_end|> or EOS is reached, a hard cap (1024 tokens), or manual /stop is triggered

Installation

Install with requirements.txt:

pip install -r requirements.txt

Or install manually:

pip install torch transformers peft bitsandbytes prompt_toolkit rich

Optional dependencies

If using GGUF (llama.cpp models):

pip install llama-cpp-python

CLI Arguments (with defaults)

usage: interface.py [-h] [-c] [-m MODEL]
                    [--deephermes] [--gguf] [--gguf-chat-format FORMAT]
                    [--blank] [--assistant-system-combo] [--assistant-system]
                    [--just-system-prompt] [--no-system-prompt]
                    [--no-assistant-prompt] [--code-check]
                    [--quantization] [--bnb-4bit] [--bnb-8bit]
                    [--custom-tokens]

optional arguments:
    -h, --help                Show this help message and exit
    -m MODEL, --model MODEL   Model path or Hugging Face repo ID
                            (default: mookiezii/Discord-Hermes-3-8B)

Feature toggles:
    -m, --model                     Model path or Hugging Face repo ID (default: mookiezii/Discord-Hermes-3-8B)
    -q, --quant                     Quantization mode: 4 or 8 (default: off). Use `-q` (no value) for 4-bit, or `-q 8` for 8-bit
    -fl, --frozen-lora              Model path or Hugging Face repo ID of the base LoRa adapter to load and freeze
    -c, --checkpoint                Model path or Hugging Face repo ID of the LoRa adapter to load
    -chs, --checkpoint-subfolder    Subfolder of the path or Hugging Face repo ID of the LoRa adapter to load
    --deephermes                    Enable DeepHermes formatting instead of ChatML
    --gguf                          Use GGUF model format with llama.cpp backend
    --gguf-chat-format              Chat format for GGUF models (default: "chatml")
    --blank                         Raw user input only, no prompts/system context
    -asc, --assistant-system-combo  Include both system and assistant system prompts
    -as, --assistant-system         Use assistant system prompt instead of standard
    --just-system-prompt            Use only the system prompt with user input
    --no-system-prompt              Do not include system prompt
    --no-assistant-prompt           Do not include assistant prompt
    --code-check                    Enable code detection and filtering via classifier
    -au, --auto                     Run preset inputs (hello → what do you do → wow tell me more) 5 times with /clear in between, then exit

Default Parameters

MIN_NEW_TOKENS = 1
MAX_NEW_TOKENS = random.randint(40, 75)
TEMPERATURE = random.uniform(0.5, 0.9)
TOP_P = random.uniform(0.7, 0.9)
TOP_K = random.randint(40, 75)
MIN_P = 0.08
NO_REPEAT_NGRAM_SIZE = 3
REPETITION_PENALTY = 1.2
EOS Handling = <|im_end|> and tokenizer.eos_token_id (extension continues until one is reached, or hard cap of 1024 tokens)

Commands

Command	Description
`/clear` `/reset` `/c`	Clear conversation history
`/back` `/b`	Undo last user+assistant exchange and preview recent history
`/h VAL`	Enable Chain-of-Thought with last VAL exchanges (default: all available)
`/d`	Disable Chain-of-Thought
`/min VAL`	Set min_new_tokens to VALb
`/max VAL`	Set max_new_tokens to VAL
`/temp VAL` or `/t VAL`	Set temperature to VAL
`/p VAL`	Set top_p to VAL
`/k VAL`	Set top_k to VAL
`/params` `/settings`	Show current generation parameters
`/r`	Randomize parameters (short-range defaults)
`/rh`	Randomize parameters with high variance (wider temp/top_p/top_k ranges)
`/stop`	Toggle extension ON/OFF (controls continuation beyond initial budget)

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
LICENSE		LICENSE
README.md		README.md
interface-screenshot.png		interface-screenshot.png
interface.py		interface.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Discord-Micae Model Chat Interface

Features

Installation

Optional dependencies

CLI Arguments (with defaults)

Default Parameters

Commands

License

About

Uh oh!

Releases

Packages

Languages

License

mookiezi/interface

Folders and files

Latest commit

History

Repository files navigation

Discord-Micae Model Chat Interface

Features

Installation

Optional dependencies

CLI Arguments (with defaults)

Default Parameters

Commands

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages