Add/audio stt #507

mina-parham · 2025-08-07T17:55:52Z

the app branch to test this is: transformerlab/transformerlab-app#719

the model to use: mlx-community/whisper-tiny-mlx

transformerlab/fastchat_openai_api.py

codecov · 2025-08-07T18:04:32Z

Codecov Report

❌ Patch coverage is 22.53521% with 55 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
transformerlab/routers/experiment/conversations.py	15.90%	37 Missing ⚠️
transformerlab/fastchat_openai_api.py	33.33%	18 Missing ⚠️

📢 Thoughts on this report? Let us know!

transformerlab/plugins/mlx_audio_server/main.py

…nsformerlab-api into add/audio-stt

transformerlab/plugins/mlx_audio_server/main.py

dadmobile · 2025-11-13T14:02:07Z

transformerlab/fastchat_openai_api.py



-class AudioRequest(BaseModel):
+class AudioSpeechRequest(BaseModel):


Preference: Maybe we call this AudioGenerationRequest just to make it clear what it is? Good call renaming as part of this change.

dadmobile · 2025-11-13T14:03:51Z

transformerlab/plugins/mlx_audio_server/index.json

-  "version": "0.0.7",
-  "supports": ["Text-to-Speech", "Audio"],
-  "model_architectures": ["MLXTextToSpeech", "StyleTTS2"],
+  "version": "0.0.8",


This is a big change. I'd make it at least 0.1.0. :D

dadmobile · 2025-11-13T14:15:07Z

transformerlab/plugins/mlx_audio_server/main.py

+            stream = params.get("stream", False)
+
+            experiment_dir = get_experiments_dir()
+            audio_dir = params.get("audio_dir", None)


There's a security alert around this audio_dir stuff which I think is only part of this PR because of an indentation change.

It's probably not a bad idea to implement this change though. I think the AI suggested code is fine.

dadmobile

I couldn't start the whisper-tiny-mlx model because it didn't read in the architecture. So it thought this should run on mlx_server instead of mlx_audio_server? How do we fix this for other models...add to our gallery and set the architecture?

aliasaria · 2025-11-13T15:31:15Z

@mina Parham to implement changes, then add model + APP

mina · 2025-11-13T16:56:47Z

Wrong mina.

…nsformerlab-api into add/audio-stt

… in path expression Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

…nsformerlab-api into add/audio-stt

mina-parham and others added 5 commits August 6, 2025 13:58

Add task parameter, base model and endpoint for transcriptions audio

aab4c91

add stt generate

32b192f

Add endpoint for uploading audio

05c23be

Fix casting issue and ruff

6e4ac38

Merge branch 'main' into add/audio-stt

dae5e65

github-advanced-security bot found potential problems Aug 7, 2025

View reviewed changes

transformerlab/fastchat_openai_api.py Fixed Show fixed Hide fixed

transformerlab/fastchat_openai_api.py Fixed Show fixed Hide fixed

mina-parham added 6 commits August 7, 2025 14:14

Add transcriptions dir

dc19c31

Add tts from main

8ef9abe

Add stt plugin

85db2f2

Update index.json

c1c5abb

typo

45b35c1

Update description

55b1825

github-advanced-security bot found potential problems Aug 7, 2025

View reviewed changes

mina-parham and others added 2 commits August 8, 2025 16:38

Update transcriptions path and add extension to the audio

128610c

Merge branch 'main' into add/audio-stt

50ca3a4

mina-parham marked this pull request as ready for review August 11, 2025 13:53

mina-parham and others added 13 commits August 11, 2025 10:27

Add list_text, download_text and delete_text endpoint

a94aea0

Merge branch 'add/audio-stt' of https://github.com/transformerlab/tra…

5268139

…nsformerlab-api into add/audio-stt

typo

a38b624

Change text to transcriptions

10e1d6b

typo

86be973

Debugging

73cb1f6

Add audio folder to metadata

ec5f201

remove hardcoded audio folder

82d627a

Merge branch 'main' into add/audio-stt

02eb8fd

Fix delete endpoint

c9c33d8

Merge branch 'add/audio-stt' of https://github.com/transformerlab/tra…

278d004

…nsformerlab-api into add/audio-stt

merge conflict

125978b

Add adaptor to AudioTranscriptionsRequest

33d4c6f

Fix error in loading the audio

aacff76

github-advanced-security bot found potential problems Nov 13, 2025

View reviewed changes

transformerlab/plugins/mlx_audio_server/main.py Fixed Show fixed Hide fixed

mina-parham added 3 commits November 13, 2025 03:58

ruff

42f79c7

ruff

e9a6419

Ruff

0f50af6

dadmobile requested changes Nov 13, 2025

View reviewed changes

dadmobile and others added 8 commits November 13, 2025 12:00

Merge branch 'main' into add/audio-stt

363a0ce

Renaming

0e12ec3

Security fix

f7c2430

Merge branch 'add/audio-stt' of https://github.com/transformerlab/tra…

dcca732

…nsformerlab-api into add/audio-stt

Potential fix for code scanning alert no. 526: Uncontrolled data used…

41cd31d

… in path expression Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

bump the version

a5ec20c

Merge branch 'add/audio-stt' of https://github.com/transformerlab/tra…

f41c3f6

…nsformerlab-api into add/audio-stt

Use secure_filename

00b5c37

dadmobile approved these changes Nov 13, 2025

View reviewed changes

mina-parham merged commit 6c14b9d into main Nov 13, 2025
7 of 8 checks passed



		class AudioRequest(BaseModel):
		class AudioSpeechRequest(BaseModel):

Add/audio stt #507

Add/audio stt #507

Uh oh!

Conversation

mina-parham commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dadmobile Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

mina-parham Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

dadmobile Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

dadmobile Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

dadmobile left a comment

Choose a reason for hiding this comment

Uh oh!

aliasaria commented Nov 13, 2025

Uh oh!

mina commented Nov 13, 2025 via email

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mina-parham commented Aug 7, 2025 •

edited

Loading

codecov bot commented Aug 7, 2025 •

edited

Loading