Enhancements: Add More Faster Whisper Models, Translation Capability, Customization, and UI Improvements #2

ChanJianHao · 2024-12-22T06:15:42Z

I've made some enhancements to your excellent project and would like to submit them as a pull request. These additions focus on expanding functionality, improving performance, and increasing user customization.

Here's a summary of the key contributions:

Expanded Model Support (Faster-Whisper): I've integrated additional faster-whisper models, providing users with more options for balancing speed and accuracy in their transcriptions. This should improve processing time, especially for longer audio/video files.

Added Translation Capabilities: This is a new feature. In addition to transcription, the program now supports translation. Users can now generate subtitles of foreign language films (or other audio/video content) into English. This opens up a whole new range of use cases for the project.

Increased Customization Options: I've added several options to allow users to fine-tune the program's behavior:

Thread Control: Users can now adjust the number of threads used for processing, allowing them to optimize performance based on their hardware.
Timeout Setting: A timeout setting has been added to prevent the program from hanging indefinitely on problematic files.
Source Language Selection: Users can now explicitly specify the source language, which can improve transcription accuracy in some cases.

Minor UI Improvements: I've made some small improvements to the user interface to enhance usability and clarity.

Blacklist for Hallucinations: Filters out certain common sentences caused by silence

I believe these changes would enhance the project's functionality and make it even more valuable to users. Thank you for creating such a fantastic project! I've really enjoyed working with it.

I'm eager to hear your feedback on these changes. Please let me know if there are any adjustments needed. If the updates are acceptable to you, please help to update the releases too!

Thank you :)

evermoving · 2024-12-22T22:59:35Z

Hi, thanks for your contribution. I will review it soon.

ChanJianHao · 2024-12-24T12:22:40Z

Hello,

I've made a few more commits in the past 24 hours to improve the handling of hallucination and made it optional. I have also made writing transcription to disk optional. However, these changes seems to be triggering some Windows Defender detection when building with PyInstaller.. I can't seem to resolve them at the moment without taking too much effort.

pyinstaller/pyinstaller#5668

Edit: Seems like the latest commit with refactoring fixed Microsoft false positive.

fznx922 · 2024-12-31T02:53:04Z

awesome work, been using your branch as its exactly what i was looking for, an issue i did find was when using the turbo models and translation from JA to EN it would display japanese instead of the translation, swapping to another model solved this, wondering if its a model limitation or code based? either way thanks so much for this, from both of you :)

evermoving

Some of the models you added to the dropdown menu (e.g. large-v3-turbo) don't appear to be supported; the console produces an error with it selected.
I don't want the auto language detection to be entirely replaced with explicit language declaration, there are situations where auto is useful, like when language is unknown or changes multiple times in a conversation. A better approach would be to have auto detection enabled with a checkbox by default, and language selection be optional.
The hallucinations.txt should be empty by default, users might or might not want to filter these words. It needs a different name as well, like filteredwords.txt, as it appears that it's filtering correctly detected words, as opposed to hallucinations.

Other than that I would be happy to approve the PR as the features seem useful.

ChanJianHao · 2025-01-01T03:03:00Z

Hi @evermoving, thanks for the review. Thank you @fznx922 for the positive feedback too! I am glad that translation is something that other users are interested in 👍🏻

Some of the models you added to the dropdown menu (e.g. large-v3-turbo) don't appear to be supported; the console produces an error with it selected.

Sure, let's remove those non-working ones. Strangely they were listed as available models by Faster-Whisper. My bad.

I don't want the auto language detection to be entirely replaced with explicit language declaration, there are situations where auto is useful, like when language is unknown or changes multiple times in a conversation. A better approach would be to have auto detection enabled with a checkbox by default, and language selection be optional.

How about we leave source language textbox as as empty string "" by default and change the tooltip to let users know that it is optional? This way the program should default to auto language detection without the need for adding another checkbox (as there are already several checkboxes now).

If you are ok with this, I will proceed to code and test out if it can work.

The hallucinations.txt should be empty by default, users might or might not want to filter these words. It needs a different name as well, like filteredwords.txt, as it appears that it's filtering correctly detected words, as opposed to hallucinations.

I actually thought hallucinations.txt could be filled in advance for the convenience of all System Captioner users, while defaulting as optional. Those words/sentences are typical hallucinations whenever there are silences during translations. As mentioned in many issues and from my hours of testing with translation on foreign films:

openai/whisper#928
SYSTRAN/faster-whisper#826
openai/whisper#1606

It would take each end-user significantly more time to compile their own list just to get rid of the typical hallucinations. This pre-filled list allows them to just turn on and enjoy the feature, sort of like a "plug and play" convenience. I agree that it may filter out correct translation though, especially if the speaker really used sentences like "Thank you for watching", but that is simply due to the training data for Whisper models, and hence filter hallucination and the text file is something the user may freely edit/disable.

Sometimes the console log may also print that it is filtering because it detected extra spaces or new lines. I am still trying to find the right balance for this.

evermoving · 2025-01-01T18:39:50Z

@ChanJianHao The 'empty string = auto' approach is a good idea.

Regarding the hallucinations, the program already has a VAD (voice activity detection) filter that makes the program not process any major silence, which should eliminate the hallucinatory behavior seen in those more basic whisper implementations. Have you experienced those hallucinations with System Captioner?

ChanJianHao · 2025-01-02T13:43:19Z

Hi @evermoving

Happy New Year!

Great! I have made the relevant changes for language to make it optional, and have also removed the turbo models which are giving errors. As I lack a powerful GPU to test the larger models, please feel free to edit my branch should there be any more faulty models on the list.

I have also renamed the hallucination file to bring better clarity on the purpose as per your feedback. Indeed, VAD (voice activity detection) filter is very useful for English transcription and rarely has issues.

However, from my testing on several machines with Translation Mode for Japanese and Chinese, hallucination can be quite common, and hence my implementation of an optional pre-built filter.

To see hallucination when translating, you may test it with something like:
https://www.youtube.com/watch?v=D_DtKgsr9WQ

Try to run it for awhile with model 'small' or 'tiny', and pause from time to time to create silences. You'll notice that without hallucination filter, despite VAD, it will start providing strange outputs even though there's no audio. Frequently saying things like "Thank you" or "I am sorry" even though the speaker in the video is giving a very different speech.

Thank you.

evermoving · 2025-01-13T09:11:45Z

@ChanJianHao Hi, unfortunately, because of personal circumstances, including traveling without my main PC, I haven't been active on Github in the last few weeks. Thanks for letting me know that VAD doesn't solve the hallucinations for other languages, if so the filter is worth including. I will review your changes soon.

ChanJianHao · 2025-01-13T11:18:22Z

@evermoving No worries, thanks for the update and looking forward to your return! 😊Please do not hesistate to let me know if there's any more changes required.

ChanJianHao and others added 15 commits December 21, 2024 18:40

Add translation

d1b16ef

Update frontend GUI

8138bc7

Add support for more Whisper models

2cc7924

Update requirements with fixed versions

7c79f10

Update console dimension

1c39070

Upddate translation toggle check and window opacity

7421f98

Remove unused imports

c6021c3

Update readme

37a86f5

Cleanup project folder

cee7fa7

Change icon from CC to S

058b320

Update README.md with troubleshooting on cublas64_12.dll

a729e60

Update README.md

4bf4a43

Change sampling size and reduce hallucinations via blacklist

647ee77

Update build portable to use hallucination file

8795485

Update hallucination blacklist logic

5fd5c3f

ChanJianHao and others added 6 commits December 22, 2024 17:44

Cleanup code in transciber

2c69d35

Add sharper icon.ico with transparent background

5e4226b

Refactor code for hallucination filter

ebd0864

Add "Filter Hallucinations" and "Store Output" options

b22b648

Add filter hallucination and store output as options

2d95978

Improve hallucination filter logic and add print statements

336bc36

ChanJianHao and others added 3 commits December 28, 2024 21:02

Overall refactor of code

9ee18c5

Update screenshot for demo

c15eb93

Update tooltip for language selection

4a48dbb

evermoving requested changes Dec 31, 2024

View reviewed changes

ChanJianHao added 2 commits January 2, 2025 05:29

Make language optional, remove turbo, rename hallucination file

6ca0057

Add line to hallucination filter

55821a2

ChanJianHao requested a review from evermoving January 4, 2025 02:28

ChanJianHao added 2 commits January 6, 2025 03:41

Add hallucinations

4d9cbe5

Justify center subtitles text

e32f940

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhancements: Add More Faster Whisper Models, Translation Capability, Customization, and UI Improvements #2

Enhancements: Add More Faster Whisper Models, Translation Capability, Customization, and UI Improvements #2

Uh oh!

ChanJianHao commented Dec 22, 2024 •

edited

Loading

Uh oh!

evermoving commented Dec 22, 2024

Uh oh!

ChanJianHao commented Dec 24, 2024 •

edited

Loading

Uh oh!

fznx922 commented Dec 31, 2024

Uh oh!

evermoving left a comment •

edited

Loading

Uh oh!

ChanJianHao commented Jan 1, 2025

Uh oh!

evermoving commented Jan 1, 2025

Uh oh!

ChanJianHao commented Jan 2, 2025 •

edited

Loading

Uh oh!

evermoving commented Jan 13, 2025

Uh oh!

ChanJianHao commented Jan 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Enhancements: Add More Faster Whisper Models, Translation Capability, Customization, and UI Improvements #2

Are you sure you want to change the base?

Enhancements: Add More Faster Whisper Models, Translation Capability, Customization, and UI Improvements #2

Uh oh!

Conversation

ChanJianHao commented Dec 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

evermoving commented Dec 22, 2024

Uh oh!

ChanJianHao commented Dec 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fznx922 commented Dec 31, 2024

Uh oh!

evermoving left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ChanJianHao commented Jan 1, 2025

Uh oh!

evermoving commented Jan 1, 2025

Uh oh!

ChanJianHao commented Jan 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

evermoving commented Jan 13, 2025

Uh oh!

ChanJianHao commented Jan 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ChanJianHao commented Dec 22, 2024 •

edited

Loading

ChanJianHao commented Dec 24, 2024 •

edited

Loading

evermoving left a comment •

edited

Loading

ChanJianHao commented Jan 2, 2025 •

edited

Loading