Galene-stt is an implementation of real-time speech-to-text (automatic subtitling) for the Galene videoconferencing server. Depending on how it is run, galene-stt may either produce a transcript of a conference, or display captions in real time.
Galene-stt connects to a Galene server using the same protocol as any other client, and may therefore be run on any machine that can connect to the server. This allows running galene-stt on a machine with a powerful GPU without requiring a GPU to be available on the server.
First, install the Vulkan client library. For example, under Debian:
sudo apt install libvulkan-dev
Build and install whisper.cpp:
git clone https://github.com/ggml-org/whisper.cpp
cd whisper.cpp
cmake -Bbuild -DGGML_VULKAN=1
cd build
make -j
sudo make install
sudo ldconfig
cd ..
The Vulkan backend is recommended, since it is portable and well
maintained. It is also possible to build whisper.cpp against CUDA, CoreML
or OpenVino; please see the whisper.cpp README.md file.
Now download your favourite model:
cd models
./download-ggml-model.sh medium
cd ../..
Install the libopus library. For example, under Debian, do
apt install libopus-dev
Build galene-stt:
git clone https://github.com/jech/galene-stt
cd galene-stt
CGO_ENABLED=1 go build -ldflags='-s -w'
Put the models where galene-stt will find them:
ln -s ../whisper.cpp/models .
By default, galene-stt produces a transcript on standard output. This requires no special permissions, and may therefore be tested on any public server:
./galene-stt https://galene.org:8443/group/public/stt/
In order to produce real-time captions, create a user called
speech-to-text with the caption permission in your Galene group:
galenectl create-group -group stt
galenectl create-user -group stt -user speech-to-text -permissions caption
galenectl set-password -group stt -user speech-to-text
Then run galene-stt with the -caption flag:
./galene-stt -caption https://galene.example.org:8443/group/stt/
Galene-stt defaults to english; for other languages, use the -lang flag:
./galene-stt -lang fr https://galene.example.org:8443/group/stt/
If galene-stt reports dropped audio, then your system is not fast enough
for the selected model. Specify a faster model using the -model
command-line option. In my testing, however, models smaller than medium
did not produce useful output.
./galene-stt -caption -model models/ggml-tiny.bin https://galene.org:8443/group/public/stt/
— Juliusz Chroboczek