Skip to content

A Python-based interactive CLI interface for chatting with Hugging Face language models, optimized for casual, Discord-style conversation using ChatML. Supports both quantized and full-precision models, live token streaming with color formatting, and dynamic generation parameter adjustment.

License

Notifications You must be signed in to change notification settings

mookiezi/interface

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Discord-Micae Model Chat Interface

A Python-based interactive CLI interface for chatting with Hugging Face language models, optimized for casual, Discord-style conversation using ChatML. Supports both quantized and full-precision models, live token streaming with color formatting, and dynamic generation parameter adjustment.


Interface Screenshot


Features

  • Multiple Model Formats

    • Hugging Face Transformers (AutoModelForCausalLM)
    • GGUF (llama.cpp) backend
    • LoRA adapter loading
    • 4-bit / 8-bit quantization with bitsandbytes
  • Custom Prompt Controls

    • Chain-of-Thought context management
    • Raw blank mode, no system prompts, or assistant-only modes
    • DeepHermes and ChatML formatting options
    • Optional code detection and filtering
  • Interactive Chat

    • Multi-line input with prompt_toolkit
    • Persistent conversation history (/back, /clear)
    • Runtime parameter adjustment (/min, /max, /temp, /p, /k, /r, /rh)
  • Streaming Output

    • Token-by-token display with Rich coloring
    • Emoji filtering and cleanup
    • Automatic lowercasing rules
    • EOS-Aware Extension: starts with a short randomized budget (40–75 tokens), then automatically extends generation in steps (64 tokens) until <|im_end|> or EOS is reached, a hard cap (1024 tokens), or manual /stop is triggered

Installation

Install with requirements.txt:

pip install -r requirements.txt

Or install manually:

pip install torch transformers peft bitsandbytes prompt_toolkit rich

Optional dependencies

If using GGUF (llama.cpp models):

pip install llama-cpp-python

CLI Arguments (with defaults)

usage: interface.py [-h] [-c] [-m MODEL]
                    [--deephermes] [--gguf] [--gguf-chat-format FORMAT]
                    [--blank] [--assistant-system-combo] [--assistant-system]
                    [--just-system-prompt] [--no-system-prompt]
                    [--no-assistant-prompt] [--code-check]
                    [--quantization] [--bnb-4bit] [--bnb-8bit]
                    [--custom-tokens]

optional arguments:
    -h, --help                Show this help message and exit
    -m MODEL, --model MODEL   Model path or Hugging Face repo ID
                            (default: mookiezii/Discord-Hermes-3-8B)

Feature toggles:
    -m, --model                     Model path or Hugging Face repo ID (default: mookiezii/Discord-Hermes-3-8B)
    -q, --quant                     Quantization mode: 4 or 8 (default: off). Use `-q` (no value) for 4-bit, or `-q 8` for 8-bit
    -fl, --frozen-lora              Model path or Hugging Face repo ID of the base LoRa adapter to load and freeze
    -c, --checkpoint                Model path or Hugging Face repo ID of the LoRa adapter to load
    -chs, --checkpoint-subfolder    Subfolder of the path or Hugging Face repo ID of the LoRa adapter to load
    --deephermes                    Enable DeepHermes formatting instead of ChatML
    --gguf                          Use GGUF model format with llama.cpp backend
    --gguf-chat-format              Chat format for GGUF models (default: "chatml")
    --blank                         Raw user input only, no prompts/system context
    -asc, --assistant-system-combo  Include both system and assistant system prompts
    -as, --assistant-system         Use assistant system prompt instead of standard
    --just-system-prompt            Use only the system prompt with user input
    --no-system-prompt              Do not include system prompt
    --no-assistant-prompt           Do not include assistant prompt
    --code-check                    Enable code detection and filtering via classifier
    -au, --auto                     Run preset inputs (hello → what do you do → wow tell me more) 5 times with /clear in between, then exit

Default Parameters

  • MIN_NEW_TOKENS = 1
  • MAX_NEW_TOKENS = random.randint(40, 75)
  • TEMPERATURE = random.uniform(0.5, 0.9)
  • TOP_P = random.uniform(0.7, 0.9)
  • TOP_K = random.randint(40, 75)
  • MIN_P = 0.08
  • NO_REPEAT_NGRAM_SIZE = 3
  • REPETITION_PENALTY = 1.2
  • EOS Handling = <|im_end|> and tokenizer.eos_token_id (extension continues until one is reached, or hard cap of 1024 tokens)

Commands

Command Description
/clear /reset /c Clear conversation history
/back /b Undo last user+assistant exchange and preview recent history
/h VAL Enable Chain-of-Thought with last VAL exchanges (default: all available)
/d Disable Chain-of-Thought
/min VAL Set min_new_tokens to VALb
/max VAL Set max_new_tokens to VAL
/temp VAL or /t VAL Set temperature to VAL
/p VAL Set top_p to VAL
/k VAL Set top_k to VAL
/params /settings Show current generation parameters
/r Randomize parameters (short-range defaults)
/rh Randomize parameters with high variance (wider temp/top_p/top_k ranges)
/stop Toggle extension ON/OFF (controls continuation beyond initial budget)

License

MIT License

About

A Python-based interactive CLI interface for chatting with Hugging Face language models, optimized for casual, Discord-style conversation using ChatML. Supports both quantized and full-precision models, live token streaming with color formatting, and dynamic generation parameter adjustment.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages