Skip to content

Conversation

@JamesG-Speechmatics
Copy link
Contributor

@JamesG-Speechmatics JamesG-Speechmatics commented Mar 3, 2025

Adds support in the client for a feature to search and replace words in the transcript on the server side.

Implementation is mostly borrowed from the existing code for specifying custom dictionary words, either via the command line or a specified file.

Testing

Base case with no replacements:

speechmatics rt transcribe --ssl-mode none --url ws://jamesg.dev-vms.speechmatics.io:9090 ~/sample.wav
So
welcome to
this
podcast
on
professor
J.R.R.
Tolkien.
And
I'd like
to

compared to

speechmatics rt transcribe --replacement-words podcast:doomsaying --ssl-mode none --url ws://jamesg.dev-vms.speechmatics.io:9090 ~/tolkien16_shortest.wav
So
welcome to
this
doomsaying
on
professor
J.R.R.
Tolkien.
And
I'd like
to

or with regexes and backcaptures

speechmatics rt transcribe --replacement-words '/(\w)\.(\w)\.(\w)\./:$1$2$3' --ssl-mode none --url ws://jamesg.dev-vms.speechmatics.io:9090 ~/tolkien16_shortest.wav
So
welcome to
this
podcast
on
professor
JRR
Tolkien.

With a file containing

[
{"from":"Tolkien", "to":"Smith"}
]

result is

speechmatics rt transcribe --replacement-words-file list.txt --ssl-mode none --url ws://jamesg.dev-vms.speechmatics.io:9090 ~/tolkien16_shortest.wav
So
welcome to
this
podcast
on
professor
J.R.R.
Smith.

error case

speechmatics rt transcribe --replacement-words '/(\w)\.(\w)\.(\w\./:$1$2$3' --ssl-mode none --url ws://jamesg.dev-vms.speechmatics.io:9090 ~/tolkien16_shortest.wav
TranscriptionError: Requested configuration is invalid: Invalid config: Invalid regex pattern: '(\w)\.(\w)\.(\w\.' - Parenthesis is not closed. [sessionid = a653fcc5-e937-402b-9d3f-fb573021f0bc]

@JamesG-Speechmatics JamesG-Speechmatics force-pushed the feature/search-replace-api branch from a851020 to 27f2e8e Compare March 3, 2025 17:34
Copy link
Contributor

@giorgosHadji giorgosHadji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Update speechmatics/cli_parser.py with type annotation
@JamesG-Speechmatics JamesG-Speechmatics force-pushed the feature/search-replace-api branch from 4b098cd to 145c1ba Compare March 5, 2025 13:35
@JamesG-Speechmatics JamesG-Speechmatics merged commit 2bcbf3d into master Mar 5, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants