Skip to content

Saghetti0/transcriber

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

transcriber

A Discord bot that transcribes voice messages.

Structure

The bot is split into two parts:

  • bot/ handles the connection to Discord's gateway and API, and communicates with the workers over Redis.
  • worker/ handles the transcription jobs, sending results to the "front-end". This component can be independently scaled to as many machines as needed, and jobs will be split across them equally.

To-do

  • Properly handle messages longer than 2000 characters (right now it just crashes...).
  • Use message flags to determine voice messages, rather than the name of the file.
  • Add a context menu action to transcribe voice messages.
  • (long term) Migrate off of Celery to a more robust task management system, probably something custom-built. This involves a rewrite of the bot.

History

The original version of this bot used whisper.cpp and ran on the CPU. This worked, but was pretty slow, as CPU inference typically is. The solution I came up with for this was to have a two-pass system, where the bot processed messages with the base model first, and then medium for higher quality. Eventually, I was able to upgrade the host machine with a GPU, and configured it to use that instead. However, due to bugs in whisper.cpp's CUDA implementation, it hallucinated a lot, to the point at which the outputs were near unusuable. I eventually just switched to the official implementation, which was fast enough to get rid of the two-pass system. I tried to clean up the code to remove a lot of the two-pass weirdness, but things are still a bit messy.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •